All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1
Date: Fri, 21 Dec 2012 15:11:33 +0100	[thread overview]
Message-ID: <20121221141133.GA13805@redhat.com> (raw)
In-Reply-To: <87bodourqt.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On 12/20, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
> > On 11/16, Eric W. Biederman wrote:
> >>
> >> @@ -216,22 +216,15 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> >>
> >>  	/*
> >>  	 * sys_wait4() above can't reap the TASK_DEAD children.
> >> -	 * Make sure they all go away, see __unhash_process().
> >> +	 * Make sure they all go away, see free_pid().
> >>  	 */
> >>  	for (;;) {
> >> -		bool need_wait = false;
> >> -
> >> -		read_lock(&tasklist_lock);
> >> -		if (!list_empty(&current->children)) {
> >> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> >> -			need_wait = true;
> >> -		}
> >> -		read_unlock(&tasklist_lock);
> >> -
> >> -		if (!need_wait)
> >> +		set_current_state(TASK_UNINTERRUPTIBLE);
> >> +		if (pid_ns->nr_hashed == 1)
> >>  			break;
> >>  		schedule();
> >>  	}
> >
> > I agree, the patch itself looks fine.
> >
> > But, with all other changes I do not understand this part at all.
> >
> > A task from the parent namespace can do setns + fork at any time
> > (until nr_hashed >= 0). So ->nr_hashed can be incremented again
> > after zap_pid_ns_processes() returns.

XXX: this creates the new pid P in this ns. Please see below...

> I want to talk about how alloc_pid and free_pid prevent nr_hashed
> from increasing once the last processes has exited the pid namespace
> but that doesn't apply here.

Not sure I understand, but it seems you agree this can happen.

> > Or, we can sleep in TASK_UNINTERRUPTIBLE "forever" if this happens
> > after kill-them-all.
>
> Sleeping forever should be prevented by this chunk in free_pid:

Note that I said "forever", not forever ;)

>
> 		switch(--ns->nr_hashed) {
> 		case 1:
> 			/* When all that is left in the pid namespace
> 			 * is the reaper wake up the reaper.  The reaper
> 			 * may be sleeping in zap_pid_ns_processes().
> 			 */
> 			wake_up_process(ns->child_reaper);
>
>
> I admit it continues to be true that if an injected process or a
> debugged process does not exit we can block waiting for all of the
> processes to be reaped indefinitely.

Yes, I meant until the injected process exits.

> > Could you explain why do we need to wait at all? I can be easily
> > wrong, but at first glance the original reason for this wait has
> > gone away?
>
> It is very nice to know that when you do waitpid for the init process of
> a pid namespace that there are no other processes in the pid namespace.

OK, and I agree. But my point was, at least this _looks_ strange, because
ns->nr_hashed == 1 is not stable.

And in fact I think this is not strange, but simply wrong.

Please consider the XXX case above. Suppose that free_pid(P) happens
after ns->child_reaper exits and thus this pointer points to nowhere.
Suppose also that there is another injected pid so nr_hashed == 2.
In this case wake_up_process(ns->child_reaper) means use-after-free,
no?

Oleg.

WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linux Containers <containers@lists.linux-foundation.org>,
	linux-kernel@vger.kernel.org, Serge Hallyn <serge@hallyn.com>,
	Gao feng <gaofeng@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1
Date: Fri, 21 Dec 2012 15:11:33 +0100	[thread overview]
Message-ID: <20121221141133.GA13805@redhat.com> (raw)
In-Reply-To: <87bodourqt.fsf@xmission.com>

On 12/20, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > On 11/16, Eric W. Biederman wrote:
> >>
> >> @@ -216,22 +216,15 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> >>
> >>  	/*
> >>  	 * sys_wait4() above can't reap the TASK_DEAD children.
> >> -	 * Make sure they all go away, see __unhash_process().
> >> +	 * Make sure they all go away, see free_pid().
> >>  	 */
> >>  	for (;;) {
> >> -		bool need_wait = false;
> >> -
> >> -		read_lock(&tasklist_lock);
> >> -		if (!list_empty(&current->children)) {
> >> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> >> -			need_wait = true;
> >> -		}
> >> -		read_unlock(&tasklist_lock);
> >> -
> >> -		if (!need_wait)
> >> +		set_current_state(TASK_UNINTERRUPTIBLE);
> >> +		if (pid_ns->nr_hashed == 1)
> >>  			break;
> >>  		schedule();
> >>  	}
> >
> > I agree, the patch itself looks fine.
> >
> > But, with all other changes I do not understand this part at all.
> >
> > A task from the parent namespace can do setns + fork at any time
> > (until nr_hashed >= 0). So ->nr_hashed can be incremented again
> > after zap_pid_ns_processes() returns.

XXX: this creates the new pid P in this ns. Please see below...

> I want to talk about how alloc_pid and free_pid prevent nr_hashed
> from increasing once the last processes has exited the pid namespace
> but that doesn't apply here.

Not sure I understand, but it seems you agree this can happen.

> > Or, we can sleep in TASK_UNINTERRUPTIBLE "forever" if this happens
> > after kill-them-all.
>
> Sleeping forever should be prevented by this chunk in free_pid:

Note that I said "forever", not forever ;)

>
> 		switch(--ns->nr_hashed) {
> 		case 1:
> 			/* When all that is left in the pid namespace
> 			 * is the reaper wake up the reaper.  The reaper
> 			 * may be sleeping in zap_pid_ns_processes().
> 			 */
> 			wake_up_process(ns->child_reaper);
>
>
> I admit it continues to be true that if an injected process or a
> debugged process does not exit we can block waiting for all of the
> processes to be reaped indefinitely.

Yes, I meant until the injected process exits.

> > Could you explain why do we need to wait at all? I can be easily
> > wrong, but at first glance the original reason for this wait has
> > gone away?
>
> It is very nice to know that when you do waitpid for the init process of
> a pid namespace that there are no other processes in the pid namespace.

OK, and I agree. But my point was, at least this _looks_ strange, because
ns->nr_hashed == 1 is not stable.

And in fact I think this is not strange, but simply wrong.

Please consider the XXX case above. Suppose that free_pid(P) happens
after ns->child_reaper exits and thus this pointer points to nowhere.
Suppose also that there is another injected pid so nr_hashed == 2.
In this case wake_up_process(ns->child_reaper) means use-after-free,
no?

Oleg.


  parent reply	other threads:[~2012-12-21 14:11 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-16 16:32 [REVIEW][PATCH 0/11] pid namespace cleanups and enhancements Eric W. Biederman
2012-11-16 16:32 ` Eric W. Biederman
     [not found] ` <8739097bkk.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-16 16:35   ` [PATCH 01/11] procfs: Use the proc generic infrastructure for proc/self Eric W. Biederman
2012-11-16 16:35     ` Eric W. Biederman
2012-11-16 16:35     ` [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1 Eric W. Biederman
     [not found]       ` <1353083750-3621-7-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:24         ` Gao feng
2012-11-21  2:24           ` Gao feng
2012-12-19 18:47         ` Oleg Nesterov
2012-12-19 18:47           ` Oleg Nesterov
     [not found]           ` <20121219184757.GB22991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21  1:19             ` Eric W. Biederman
2012-12-21  1:19               ` Eric W. Biederman
     [not found]               ` <87bodourqt.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-21 14:11                 ` Oleg Nesterov [this message]
2012-12-21 14:11                   ` Oleg Nesterov
     [not found]                   ` <20121221141133.GA13805-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21 15:02                     ` Oleg Nesterov
2012-12-21 15:02                       ` Oleg Nesterov
     [not found]                       ` <20121221150238.GA16003-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21 15:31                         ` Oleg Nesterov
2012-12-21 15:31                           ` Oleg Nesterov
     [not found]                           ` <20121221153152.GA17250-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21 18:42                             ` Eric W. Biederman
2012-12-21 18:42                               ` Eric W. Biederman
2012-12-21 18:33                     ` Eric W. Biederman
2012-12-21 18:33                       ` Eric W. Biederman
     [not found]     ` <1353083750-3621-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-16 16:35       ` [PATCH 02/11] procfs: Don't cache a pid in the root inode Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-2-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  1:07           ` Gao feng
2012-11-21  1:07             ` Gao feng
2012-11-16 16:35       ` [PATCH 03/11] pidns: Capture the user namespace and filter ns_last_pid Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-3-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  1:26           ` Gao feng
2012-11-21  1:26             ` Gao feng
2012-11-16 16:35       ` [PATCH 04/11] pidns: Use task_active_pid_ns where appropriate Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-4-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:02           ` Gao feng
2012-11-21  2:02             ` Gao feng
2012-11-16 16:35       ` [PATCH 05/11] pidns: Make the pidns proc mount/umount logic obvious Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-5-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-19 11:02           ` Gao feng
2012-11-19 11:02             ` Gao feng
2012-11-16 16:35       ` [PATCH 06/11] pidns: Don't allow new processes in a dead pid namespace Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-6-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:17           ` Gao feng
2012-11-21  2:17             ` Gao feng
2012-11-16 16:35       ` [PATCH 07/11] pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1 Eric W. Biederman
2012-11-16 16:35       ` [PATCH 08/11] pidns: Deny strange cases when creating pid namespaces Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-8-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:25           ` Gao feng
2012-11-21  2:25             ` Gao feng
2012-11-16 16:35       ` [PATCH 09/11] pidns: Add setns support Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-9-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-19  9:11           ` Gao feng
2012-11-19  9:11             ` Gao feng
     [not found]             ` <50A9F7DE.60807-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-11-19  9:27               ` Eric W. Biederman
2012-11-19  9:27                 ` Eric W. Biederman
2012-11-21  2:36           ` Gao feng
2012-11-21  2:36             ` Gao feng
2012-11-16 16:35       ` [PATCH 10/11] pidns: Consolidate initialzation of special init task state Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-10-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:56           ` Gao feng
2012-11-21  2:56             ` Gao feng
2012-11-16 16:35       ` [PATCH 11/11] pidns: Support unsharing the pid namespace Eric W. Biederman
2012-11-16 16:35         ` Eric W. Biederman
     [not found]         ` <1353083750-3621-11-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-21  2:55           ` Gao feng
2012-11-21  2:55             ` Gao feng
2012-12-19 18:14           ` Oleg Nesterov
2012-12-19 18:14             ` Oleg Nesterov
     [not found]             ` <20121219181400.GA22991-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21  1:43               ` Eric W. Biederman
2012-12-21  1:43                 ` Eric W. Biederman
     [not found]                 ` <871uektc2f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-21 15:49                   ` Oleg Nesterov
2012-12-21 15:49                     ` Oleg Nesterov
     [not found]                     ` <20121221154931.GA18730-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-12-21 17:51                       ` Eric W. Biederman
2012-12-21 17:51                         ` Eric W. Biederman
     [not found]                         ` <87fw2zmgzc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-21 19:24                           ` Rob Landley
2012-12-21 19:24                             ` Rob Landley
2012-12-21 22:58                             ` namespace documentation Eric W. Biederman
2012-12-21 22:58                             ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121221141133.GA13805@redhat.com \
    --to=oleg-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.