All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Louis Rilling <louis.rilling@kerlabs.com>,
	Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH 2/3] pidns: Guarantee that the pidns init will be the last pidns process reaped.
Date: Mon, 21 May 2012 14:44:14 +0200	[thread overview]
Message-ID: <20120521124414.GA20391@redhat.com> (raw)
In-Reply-To: <87zk95kper.fsf@xmission.com>

On 05/18, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> >> I think there is something very compelling about your solution,
> >> we do need my bit about making the init process ignore SIGCHLD
> >> so all of init's children self reap.
> >
> > Not sure I understand. This can work with or without 3/3 which
> > changes zap_pid_ns_processes() to ignore SIGCHLD. And just in
> > case, I think 3/3 is fine.
>
> The only issue I see is that without 3/3 we might have processes that
> on one wait(2)s for and so will never have release_task called on.
>
> We do have the wait loop

Yes, and we need this loop anyway, even if SIGCHLD is ignored.
It is possible that we already have a EXIT_ZOMBIE child(s) when
zap_pid_ns_processes().

> but I think there is a race possible there.

Hmm. I do not see any race, but perhaps I missed something.
I think we can trust -ECHILD, or do_wait() is buggy.

Hmm. But there is another (off-topic) problem, security_task_wait()
can return an error if there are some security policy problems...
OK, this shouldn't happen I hope.

> > And once again, this wait_event() + __wake_up_parent() is very
> > simple and straightforward, we can cleanup this code later if
> > needed.
>
> Yes, and it doesn't when you do an UNINTERRUPTIBLE sleep with
> an INTERRUPTIBLE wake up unless I misread the code.

Yes. so we need wait_event_interruptible() or __unhash_process()
should use __wake_up_sync_key(wait_chldexit).

> > Yes. This is the known oddity. We always notify the tracer if the
> > leader exits, even if !thread_group_empty(). But after that the
> > tracer can't detach, and it can't do do_wait(WEXITED).
> >
> > The problem is not that we can't "fix" this. Just any discussed
> > fix adds the subtle/incompatible user-visible change.
>
> Yes and that is nasty.

Agreed. ptrace API is nasty ;)

> and moving detach_pid so we don't have to be super careful about
> where we call task_active_pid_ns.

Yes, I was thinking about this change too,

> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -189,6 +189,17 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  		rc = sys_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>
> +	read_lock(&tasklist_lock);
> +	for (;;) {
> +		__set_current_state(TASK_INTERRUPTIBLE);
> +		if (list_empty(&current->children))
> +			break;
> +		read_unlock(&tasklist_lock);
> +		schedule();

OK, but then it makes sense to add clear_thread_flag(TIF_SIGPENDING)
before schedule, to avoid the busy-wait loop (like the sys_wait4 loop
does). Or simply use TASK_UNINTERRUPTIBLE, I do not think it is that
important to "fool" /proc/loadavg. But I am fine either way.

Maybe you can also add "ifdef CONFIG_PID_NS" into __unhash_process(),
but this is minor too.

Oleg.


  reply	other threads:[~2012-05-21 12:45 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-28  9:19 [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith
2012-04-28 14:26 ` Oleg Nesterov
2012-04-29  4:13   ` Mike Galbraith
2012-04-29  7:57   ` Eric W. Biederman
2012-04-29  9:49     ` Mike Galbraith
2012-04-29 16:58     ` Oleg Nesterov
2012-04-30  2:59       ` Eric W. Biederman
2012-04-30  3:25         ` Mike Galbraith
2012-05-02 12:40         ` Oleg Nesterov
2012-05-02 17:37           ` Eric W. Biederman
2012-04-30  3:01       ` [PATCH] " Mike Galbraith
     [not found]         ` <m1zk9rmyh4.fsf@fess.ebiederm.org>
2012-05-01 20:42           ` Andrew Morton
2012-05-03  3:12             ` Mike Galbraith
2012-05-03 14:56               ` Mike Galbraith
2012-05-04  4:27                 ` Mike Galbraith
2012-05-04  7:55                   ` Eric W. Biederman
2012-05-04  8:34                     ` Mike Galbraith
2012-05-04  9:45                     ` Mike Galbraith
2012-05-04 14:13                       ` Eric W. Biederman
2012-05-04 14:49                         ` Mike Galbraith
2012-05-04 15:36                           ` Eric W. Biederman
2012-05-04 16:57                             ` Mike Galbraith
2012-05-04 20:29                               ` Eric W. Biederman
2012-05-05  5:56                                 ` Mike Galbraith
2012-05-05  6:08                                   ` Mike Galbraith
2012-05-05  7:12                                     ` Mike Galbraith
2012-05-05 11:37                                       ` Eric W. Biederman
2012-05-07 21:51                                       ` [PATCH] vfs: Speed up deactivate_super for non-modular filesystems Eric W. Biederman
2012-05-07 22:17                                         ` Al Viro
2012-05-07 23:56                                           ` Paul E. McKenney
2012-05-08  1:07                                             ` Eric W. Biederman
2012-05-08  4:53                                               ` Mike Galbraith
2012-05-09  7:55                                               ` Nick Piggin
2012-05-09 11:02                                                 ` Eric W. Biederman
2012-05-09 11:02                                                   ` Eric W. Biederman
2012-05-15  8:40                                                   ` Nick Piggin
2012-05-16  0:34                                                     ` Eric W. Biederman
2012-05-16  0:34                                                       ` Eric W. Biederman
2012-05-09 13:59                                                 ` Paul E. McKenney
2012-05-04  8:03                 ` [PATCH] Re: [RFC PATCH] namespaces: fix leak on fork() failure Eric W. Biederman
2012-05-04  8:19                   ` Mike Galbraith
2012-05-04  8:54                     ` Mike Galbraith
2012-05-07  0:32             ` [PATCH 0/3] pidns: Closing the pid namespace exit race Eric W. Biederman
2012-05-07  0:33               ` [PATCH 1/3] pidns: Use task_active_pid_ns in do_notify_parent Eric W. Biederman
2012-05-07  0:35               ` [PATCH 2/3] pidns: Guarantee that the pidns init will be the last pidns process reaped Eric W. Biederman
2012-05-08 22:50                 ` Andrew Morton
2012-05-16 18:39                 ` Oleg Nesterov
2012-05-16 19:34                   ` Oleg Nesterov
2012-05-16 20:54                   ` Eric W. Biederman
2012-05-17 17:00                     ` Oleg Nesterov
2012-05-17 21:46                       ` Eric W. Biederman
2012-05-18 12:39                         ` Oleg Nesterov
2012-05-19  0:03                           ` Eric W. Biederman
2012-05-21 12:44                             ` Oleg Nesterov [this message]
2012-05-22  0:16                               ` Eric W. Biederman
2012-05-22  0:20                               ` [PATCH] pidns: Guarantee that the pidns init will be the last pidns process reaped. v2 Eric W. Biederman
2012-05-22 16:54                                 ` Oleg Nesterov
2012-05-22 19:23                                 ` Andrew Morton
2012-05-23 14:52                                   ` Oleg Nesterov
2012-05-25 15:15                                     ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Oleg Nesterov
2012-05-25 15:59                                       ` [PATCH -mm 0/1] pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper Oleg Nesterov
2012-05-25 16:00                                         ` [PATCH -mm 1/1] " Oleg Nesterov
2012-05-25 21:43                                           ` Eric W. Biederman
2012-05-27 19:10                                             ` [PATCH v2 -mm 0/1] " Oleg Nesterov
2012-05-27 19:11                                               ` [PATCH v2 -mm 1/1] " Oleg Nesterov
2012-05-29  6:34                                                 ` Eric W. Biederman
2012-05-25 21:25                                       ` [PATCH -mm] pidns-guarantee-that-the-pidns-init-will-be-the-last-pidns-process-r eaped-v2-fix-fix Eric W. Biederman
2012-05-27 18:41                                         ` [PATCH -mm v2] " Oleg Nesterov
2012-05-07  0:35               ` [PATCH 3/3] pidns: Make killed children autoreap Eric W. Biederman
2012-05-08 22:51                 ` Andrew Morton
2012-04-30 13:57 ` [RFC PATCH] namespaces: fix leak on fork() failure Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120521124414.GA20391@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=louis.rilling@kerlabs.com \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.