From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750917AbdAXOHn (ORCPT ); Tue, 24 Jan 2017 09:07:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34108 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750708AbdAXOHm (ORCPT ); Tue, 24 Jan 2017 09:07:42 -0500 Date: Tue, 24 Jan 2017 15:07:38 +0100 From: Oleg Nesterov To: "Eric W. Biederman" Cc: Pavel Tikhomirov , Lennart Poettering , Kay Sievers , Ingo Molnar , Peter Zijlstra , Andrew Morton , Cyrill Gorcunov , John Stultz , Thomas Gleixner , Nicolas Pitre , Michal Hocko , Stanislav Kinsburskiy , Mateusz Guzik , linux-kernel@vger.kernel.org, Pavel Emelyanov , Konstantin Khorenko Subject: Re: setns() && PR_SET_CHILD_SUBREAPER Message-ID: <20170124140738.GA21034@redhat.com> References: <20170119164346.4214-1-ptikhomirov@virtuozzo.com> <20170123164420.GA2145@redhat.com> <87tw8p8wo8.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87tw8p8wo8.fsf@xmission.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 24 Jan 2017 14:07:43 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/24, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Suppose we have a process P in the root namespace and another namespace X. > > > > P does setns() and enters the X namespace. > > P forks a child C. > > > > C forks a grandchild G. > > C exits. > > > > The question is, where should we reparent the grandchild G? In the normal > > case it will be reparented to X->child_reaper and this looks correct. > > > > But lets suppose that P runs with the ->has_child_subreaper bit set. In > > this case it will be reparented to P's sub-reaper or a global init, and > > given that P can't control its ->has_child_subreaper flag this does not > > look right to me. > > > > I can make a simple patch but perhaps I missed something or we actually > > want this (imo strange) behaviour? > > We definitely do not want a child to be repareted out of a pid namespace > when the pid namespace has a perfectly fine child_reaper. > > The special case for the init_task in find_new_reaper appears to be the > instance of this problem that was considered in the code. Actually we should blame the same_thread_group(reaper, child_reaper) check, it should had ensured we could not cross the namespaces, but it is not enough. Because this logic predates setns(). > Semantically what we want to do is walk up the parents in the process > tree. If a parent has is_child_subreaper we stop at it. If the > transition from one parent to the next we are switching pid namespaces > we want the reaper from the pid namespace. Yes, this is what I have in mind, see the patch below. I need to re-check it and update the comment to explain why we can't simply check child_reaper as we currently do. This way we can start the search from father->real_parent, but the comment above the "reaper == &init_task" is no longer correct, we always need this check although perhaps is_idle_task(reaper) would be better. > As I recall has_child_subreaper was just supposed to be an optimization > so the common case would not have to walk up the process tree when > finding it's parent. Yep. > If we retain any optimizations such as has_child_subreaper please > consider the case where a process with is_child_subreaper set exits, > and what happens to it's children. Yes, in this case it should not have any effect. Well, there is another corner case, perhaps we should turn if (!reaper->signal->is_child_subreaper) continue; into if (!reaper->signal->is_child_subreaper) { if (!reaper->signal->has_child_subreaper) break; continue; } this looks a bit more correct if the exited "is_child_subreaper" process was forked, and after that its parent called prctl(SET_CHILD_SUBREAPER). But I think we do not care and Pavel is going to eliminate the case when a child of is_child_subreaper task can run without has_child_subreaper flag set. So what do you think about the patch below? Oleg. --- a/kernel/exit.c +++ b/kernel/exit.c @@ -569,15 +569,15 @@ static struct task_struct *find_new_reaper(struct task_struct *father, return thread; if (father->signal->has_child_subreaper) { + unsigned int level = task_pid(father)->level; /* * Find the first ->is_child_subreaper ancestor in our pid_ns. - * We start from father to ensure we can not look into another - * namespace, this is safe because all its threads are dead. + * We check pid->level, this is slightly more efficient than + * task_active_pid_ns(reaper) != task_active_pid_ns(father). */ - for (reaper = father; - !same_thread_group(reaper, child_reaper); + for (reaper = father->real_parent; + task_pid(reaper)->level == level; reaper = reaper->real_parent) { - /* call_usermodehelper() descendants need this check */ if (reaper == &init_task) break; if (!reaper->signal->is_child_subreaper)