All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
	Lennart Poettering <lennart@poettering.net>,
	Kay Sievers <kay.sievers@vrfy.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Nicolas Pitre <nicolas.pitre@linaro.org>,
	Michal Hocko <mhocko@suse.com>,
	Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>,
	Mateusz Guzik <mguzik@redhat.com>,
	linux-kernel@vger.kernel.org,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Konstantin Khorenko <khorenko@virtuozzo.com>
Subject: Re: setns() && PR_SET_CHILD_SUBREAPER
Date: Tue, 24 Jan 2017 15:07:38 +0100	[thread overview]
Message-ID: <20170124140738.GA21034@redhat.com> (raw)
In-Reply-To: <87tw8p8wo8.fsf@xmission.com>

On 01/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > Suppose we have a process P in the root namespace and another namespace X.
> >
> > P does setns() and enters the X namespace.
> > P forks a child C.
> >
> > C forks a grandchild G.
> > C exits.
> >
> > The question is, where should we reparent the grandchild G? In the normal
> > case it will be reparented to X->child_reaper and this looks correct.
> >
> > But lets suppose that P runs with the ->has_child_subreaper bit set. In
> > this case it will be reparented to P's sub-reaper or a global init, and
> > given that P can't control its ->has_child_subreaper flag this does not
> > look right to me.
> >
> > I can make a simple patch but perhaps I missed something or we actually
> > want this (imo strange) behaviour?
>
> We definitely do not want a child to be repareted out of a pid namespace
> when the pid namespace has a perfectly fine child_reaper.
>
> The special case for the init_task in find_new_reaper appears to be the
> instance of this problem that was considered in the code.

Actually we should blame the same_thread_group(reaper, child_reaper) check,
it should had ensured we could not cross the namespaces, but it is not
enough. Because this logic predates setns().

> Semantically what we want to do is walk up the parents in the process
> tree.  If a parent has is_child_subreaper we stop at it.  If the
> transition from one parent to the next we are switching pid namespaces
> we want the reaper from the pid namespace.

Yes, this is what I have in mind, see the patch below. I need to re-check
it and update the comment to explain why we can't simply check child_reaper
as we currently do.

This way we can start the search from father->real_parent, but the comment
above the "reaper == &init_task" is no longer correct, we always need this
check although perhaps is_idle_task(reaper) would be better.

> As I recall has_child_subreaper was just supposed to be an optimization
> so the common case would not have to walk up the process tree when
> finding it's parent.

Yep.

> If we retain any optimizations such as has_child_subreaper please
> consider the case where a process with is_child_subreaper set exits,
> and what happens to it's children.

Yes, in this case it should not have any effect. Well, there is another
corner case, perhaps we should turn

		if (!reaper->signal->is_child_subreaper)
			continue;

into
		if (!reaper->signal->is_child_subreaper) {
			if (!reaper->signal->has_child_subreaper)
				break;
			continue;
		}

this looks a bit more correct if the exited "is_child_subreaper" process
was forked, and after that its parent called prctl(SET_CHILD_SUBREAPER).
But I think we do not care and Pavel is going to eliminate the case when
a child of is_child_subreaper task can run without has_child_subreaper
flag set.

So what do you think about the patch below?

Oleg.

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -569,15 +569,15 @@ static struct task_struct *find_new_reaper(struct task_struct *father,
 		return thread;
 
 	if (father->signal->has_child_subreaper) {
+		unsigned int level = task_pid(father)->level;
 		/*
 		 * Find the first ->is_child_subreaper ancestor in our pid_ns.
-		 * We start from father to ensure we can not look into another
-		 * namespace, this is safe because all its threads are dead.
+		 * We check pid->level, this is slightly more efficient than
+		 * task_active_pid_ns(reaper) != task_active_pid_ns(father).
 		 */
-		for (reaper = father;
-		     !same_thread_group(reaper, child_reaper);
+		for (reaper = father->real_parent;
+		     task_pid(reaper)->level == level;
 		     reaper = reaper->real_parent) {
-			/* call_usermodehelper() descendants need this check */
 			if (reaper == &init_task)
 				break;
 			if (!reaper->signal->is_child_subreaper)

  reply	other threads:[~2017-01-24 14:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-19 16:43 [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-20 18:14 ` Oleg Nesterov
2017-01-22 10:00   ` Pavel Tikhomirov
2017-01-22 10:11   ` Pavel Tikhomirov
2017-01-23 11:55     ` Oleg Nesterov
2017-01-23 12:52       ` task_is_descendant() cleanup Oleg Nesterov
2017-01-25 21:59         ` Kees Cook
2017-01-30 13:49           ` Oleg Nesterov
2017-01-23 14:30       ` [PATCH] prctl: propagate has_child_subreaper flag to every descendant Pavel Tikhomirov
2017-01-23 16:06         ` Oleg Nesterov
2017-01-23 11:57 ` [PATCH] introduce the walk_process_tree() helper Oleg Nesterov
2017-01-23 12:07   ` Oleg Nesterov
2017-01-24 15:01   ` Pavel Tikhomirov
2017-01-23 16:44 ` setns() && PR_SET_CHILD_SUBREAPER Oleg Nesterov
2017-01-23 18:21   ` Eric W. Biederman
2017-01-24 14:07     ` Oleg Nesterov [this message]
2017-01-24 15:24       ` Eric W. Biederman
2017-01-30 18:16         ` Oleg Nesterov
2017-01-30 18:17         ` [PATCH] exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170124140738.GA21034@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=john.stultz@linaro.org \
    --cc=kay.sievers@vrfy.org \
    --cc=khorenko@virtuozzo.com \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mguzik@redhat.com \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=nicolas.pitre@linaro.org \
    --cc=peterz@infradead.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=skinsbursky@virtuozzo.com \
    --cc=tglx@linutronix.de \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.