linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
	Lennart Poettering <lennart@poettering.net>,
	Kay Sievers <kay.sievers@vrfy.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Nicolas Pitre <nicolas.pitre@linaro.org>,
	Michal Hocko <mhocko@suse.com>,
	Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>,
	Mateusz Guzik <mguzik@redhat.com>,
	linux-kernel@vger.kernel.org,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Konstantin Khorenko <khorenko@virtuozzo.com>
Subject: Re: setns() && PR_SET_CHILD_SUBREAPER
Date: Tue, 24 Jan 2017 15:07:38 +0100	[thread overview]
Message-ID: <20170124140738.GA21034@redhat.com> (raw)
In-Reply-To: <87tw8p8wo8.fsf@xmission.com>

On 01/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > Suppose we have a process P in the root namespace and another namespace X.
> >
> > P does setns() and enters the X namespace.
> > P forks a child C.
> >
> > C forks a grandchild G.
> > C exits.
> >
> > The question is, where should we reparent the grandchild G? In the normal
> > case it will be reparented to X->child_reaper and this looks correct.
> >
> > But lets suppose that P runs with the ->has_child_subreaper bit set. In
> > this case it will be reparented to P's sub-reaper or a global init, and
> > given that P can't control its ->has_child_subreaper flag this does not
> > look right to me.
> >
> > I can make a simple patch but perhaps I missed something or we actually
> > want this (imo strange) behaviour?
>
> We definitely do not want a child to be repareted out of a pid namespace
> when the pid namespace has a perfectly fine child_reaper.
>
> The special case for the init_task in find_new_reaper appears to be the
> instance of this problem that was considered in the code.

Actually we should blame the same_thread_group(reaper, child_reaper) check,
it should had ensured we could not cross the namespaces, but it is not
enough. Because this logic predates setns().

> Semantically what we want to do is walk up the parents in the process
> tree.  If a parent has is_child_subreaper we stop at it.  If the
> transition from one parent to the next we are switching pid namespaces
> we want the reaper from the pid namespace.

Yes, this is what I have in mind, see the patch below. I need to re-check
it and update the comment to explain why we can't simply check child_reaper
as we currently do.

This way we can start the search from father->real_parent, but the comment
above the "reaper == &init_task" is no longer correct, we always need this
check although perhaps is_idle_task(reaper) would be better.

> As I recall has_child_subreaper was just supposed to be an optimization
> so the common case would not have to walk up the process tree when
> finding it's parent.

Yep.

> If we retain any optimizations such as has_child_subreaper please
> consider the case where a process with is_child_subreaper set exits,
> and what happens to it's children.

Yes, in this case it should not have any effect. Well, there is another
corner case, perhaps we should turn

		if (!reaper->signal->is_child_subreaper)
			continue;

into
		if (!reaper->signal->is_child_subreaper) {
			if (!reaper->signal->has_child_subreaper)
				break;
			continue;
		}

this looks a bit more correct if the exited "is_child_subreaper" process
was forked, and after that its parent called prctl(SET_CHILD_SUBREAPER).
But I think we do not care and Pavel is going to eliminate the case when
a child of is_child_subreaper task can run without has_child_subreaper
flag set.

So what do you think about the patch below?

Oleg.

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -569,15 +569,15 @@ static struct task_struct *find_new_reaper(struct task_struct *father,
 		return thread;
 
 	if (father->signal->has_child_subreaper) {
+		unsigned int level = task_pid(father)->level;
 		/*
 		 * Find the first ->is_child_subreaper ancestor in our pid_ns.
-		 * We start from father to ensure we can not look into another
-		 * namespace, this is safe because all its threads are dead.
+		 * We check pid->level, this is slightly more efficient than
+		 * task_active_pid_ns(reaper) != task_active_pid_ns(father).
 		 */
-		for (reaper = father;
-		     !same_thread_group(reaper, child_reaper);
+		for (reaper = father->real_parent;
+		     task_pid(reaper)->level == level;
 		     reaper = reaper->real_parent) {
-			/* call_usermodehelper() descendants need this check */
 			if (reaper == &init_task)
 				break;
 			if (!reaper->signal->is_child_subreaper)

  reply	other threads:[~2017-01-24 14:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-19 16:43 [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-20 18:14 ` Oleg Nesterov
2017-01-22 10:00   ` Pavel Tikhomirov
2017-01-22 10:11   ` Pavel Tikhomirov
2017-01-23 11:55     ` Oleg Nesterov
2017-01-23 12:52       ` task_is_descendant() cleanup Oleg Nesterov
2017-01-25 21:59         ` Kees Cook
2017-01-30 13:49           ` Oleg Nesterov
2017-01-23 14:30       ` [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-23 16:06         ` Oleg Nesterov
2017-01-23 11:57 ` [PATCH] introduce the walk_process_tree() helper Oleg Nesterov
2017-01-23 12:07   ` Oleg Nesterov
2017-01-24 15:01   ` Pavel Tikhomirov
2017-01-23 16:44 ` setns() && PR_SET_CHILD_SUBREAPER Oleg Nesterov
2017-01-23 18:21   ` Eric W. Biederman
2017-01-24 14:07     ` Oleg Nesterov [this message]
2017-01-24 15:24       ` Eric W. Biederman
2017-01-30 18:16         ` Oleg Nesterov
2017-01-30 18:17         ` [PATCH] exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170124140738.GA21034@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=john.stultz@linaro.org \
    --cc=kay.sievers@vrfy.org \
    --cc=khorenko@virtuozzo.com \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mguzik@redhat.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=nicolas.pitre@linaro.org \
    --cc=peterz@infradead.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=skinsbursky@virtuozzo.com \
    --cc=tglx@linutronix.de \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).