From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751366AbdAWSZg (ORCPT ); Mon, 23 Jan 2017 13:25:36 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:49179 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750705AbdAWSZf (ORCPT ); Mon, 23 Jan 2017 13:25:35 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Pavel Tikhomirov , Lennart Poettering , Kay Sievers , Ingo Molnar , Peter Zijlstra , Andrew Morton , Cyrill Gorcunov , John Stultz , Thomas Gleixner , Nicolas Pitre , Michal Hocko , Stanislav Kinsburskiy , Mateusz Guzik , linux-kernel@vger.kernel.org, Pavel Emelyanov , Konstantin Khorenko References: <20170119164346.4214-1-ptikhomirov@virtuozzo.com> <20170123164420.GA2145@redhat.com> Date: Tue, 24 Jan 2017 07:21:11 +1300 In-Reply-To: <20170123164420.GA2145@redhat.com> (Oleg Nesterov's message of "Mon, 23 Jan 2017 17:44:20 +0100") Message-ID: <87tw8p8wo8.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1cVjIz-0008FP-W5;;;mid=<87tw8p8wo8.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=101.100.131.98;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18858XjsBAzYFxXa8vIynUXug5bdnBG5dY= X-SA-Exim-Connect-IP: 101.100.131.98 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.5 XMGappySubj_01 Very gappy subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 351 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 4.0 (1.1%), b_tie_ro: 3.1 (0.9%), parse: 1.26 (0.4%), extract_message_metadata: 4.6 (1.3%), get_uri_detail_list: 2.2 (0.6%), tests_pri_-1000: 7 (2.1%), tests_pri_-950: 1.79 (0.5%), tests_pri_-900: 1.59 (0.5%), tests_pri_-400: 31 (8.7%), check_bayes: 29 (8.3%), b_tokenize: 13 (3.7%), b_tok_get_all: 8 (2.2%), b_comp_prob: 3.2 (0.9%), b_tok_touch_all: 2.4 (0.7%), b_finish: 0.70 (0.2%), tests_pri_0: 284 (80.9%), check_dkim_signature: 0.48 (0.1%), check_dkim_adsp: 2.7 (0.8%), tests_pri_500: 3.4 (1.0%), rewrite_mail: 0.00 (0.0%) Subject: Re: setns() && PR_SET_CHILD_SUBREAPER X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > And this discussion reminds me again that I do not understand how setns() > and PR_SET_CHILD_SUBREAPER should play together... Add cc's. I agree that they are currently playing together incorrectly. > Suppose we have a process P in the root namespace and another namespace X. > > P does setns() and enters the X namespace. > P forks a child C. > > C forks a grandchild G. > C exits. > > The question is, where should we reparent the grandchild G? In the normal > case it will be reparented to X->child_reaper and this looks correct. > > But lets suppose that P runs with the ->has_child_subreaper bit set. In > this case it will be reparented to P's sub-reaper or a global init, and > given that P can't control its ->has_child_subreaper flag this does not > look right to me. > > I can make a simple patch but perhaps I missed something or we actually > want this (imo strange) behaviour? We definitely do not want a child to be repareted out of a pid namespace when the pid namespace has a perfectly fine child_reaper. The special case for the init_task in find_new_reaper appears to be the instance of this problem that was considered in the code. Given the semantics described and asked for of PR_SET_CHILD_SUBREAPER I believe has_child_subreaper needs to be strictly considered an implementation detail and any way that userspace can observe it a bug in the code. Semantically what we want to do is walk up the parents in the process tree. If a parent has is_child_subreaper we stop at it. If the transition from one parent to the next we are switching pid namespaces we want the reaper from the pid namespace. As I recall has_child_subreaper was just supposed to be an optimization so the common case would not have to walk up the process tree when finding it's parent. If we retain any optimizations such as has_child_subreaper please consider the case where a process with is_child_subreaper set exits, and what happens to it's children. Eric