From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758415AbZBTAfu (ORCPT ); Thu, 19 Feb 2009 19:35:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752094AbZBTAfm (ORCPT ); Thu, 19 Feb 2009 19:35:42 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:42136 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751078AbZBTAfl (ORCPT ); Thu, 19 Feb 2009 19:35:41 -0500 To: Roland McGrath Cc: Oleg Nesterov , Sukadev Bhattiprolu , Andrew Morton , daniel@hozac.com, Containers , linux-kernel@vger.kernel.org References: <20090219030207.GA18783@us.ibm.com> <20090219030743.GG18990@us.ibm.com> <20090219185159.GA374@redhat.com> <20090219223137.GA10378@redhat.com> <20090219235159.6A542FC3BE@magilla.sf.frob.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 19 Feb 2009 16:35:58 -0800 In-Reply-To: <20090219235159.6A542FC3BE@magilla.sf.frob.com> (Roland McGrath's message of "Thu\, 19 Feb 2009 15\:51\:59 -0800 \(PST\)") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=67.169.126.145;;;frm=ebiederm@xmission.com;;;spf=neutral X-SA-Exim-Connect-IP: 67.169.126.145 X-SA-Exim-Rcpt-To: roland@redhat.com, linux-kernel@vger.kernel.org, containers@lists.osdl.org, daniel@hozac.com, akpm@osdl.org, sukadev@linux.vnet.ibm.com, oleg@redhat.com X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Roland McGrath X-Spam-Relay-Country: X-Spam-Report: * -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0088] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.0 XM_SPF_Neutral SPF-Neutral Subject: Re: [PATCH 7/7][v8] SI_USER: Masquerade si_pid when crossing pid ns boundary X-SA-Exim-Version: 4.2.1 (built Thu, 25 Oct 2007 00:26:12 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Roland McGrath writes: >> Suppose I have 3 processes in a process group in three separate pid >> namespaces. >> >> Looking from the init pid namespace I have: >> pid pgrp ppid >> 10 10 1 >> 11 10 10 >> 12 10 11 >> >> Looking from the pid namespace of pid 11 I have: >> pid pgrp ppid >> 0 0 0 >> 1 0 0 >> 2 0 1 >> >> Looking from the pid namespace of pid 12 I have: >> pid pgrp ppid >> 0 0 0 >> 0 0 0 >> 1 0 0 >> >> So if the process with pid 12 in the initial pid namespace >> sends to process group 0. > > There is no "process group 0". 0 means "the sender's pgrp". Exactly. It just happens in this case that pid_nr_ns returns 0 for the process group number as well as the process group the process is a member of, that was created outside of the current pid namespace. > One possibility is that perhaps what people really want the pid_ns to mean > is that "the sender's pgrp" in the view of the sender does not include any > processes outside its pid_ns scope. That would be consistent with the > behavior of kill (kill_something_info) on -1; it's described as "all > processes", but in fact means "all processes within my pid_ns scope". > > What I mean to describe there is changing kill_something_info, so that > e.g. killpg() inside the NS would affect only the NS init itself but e.g. > ^Z (effectively an implicit killpg() that's always from the global NS) > would also go to that init's "mother" pgrp in the outer NS. > Another possibility is to decide that's just not worth having at all, and > CLONE_NEWNS should just implicitly reset pgrp to self. That is simple. > But perhaps today someone has a script running a pid_ns-world whose init is > gracefully killed by ^C of the whole script and we wouldn't want to break > that if it is actually useful now. It is especially useful, and this is a deliberate feature. Having sessions and process groups extend across pid namespace borders means you can share a tty and job control functions correctly. Very handy for circumstances where you want a light weight temporary container, and something I am actively using today. The practical benefit is that you can upgrade from situations where you would previous use chroot without extra hassle. In practice I don't care about si_pid and I doubt I care about processes sending signals outside of their pid namespace. But I do care about sharing a tty and a session and having job control work. >> pid 10 should see si_pid 12. >> pid 11 should see si_pid 2. > > We indeed have this problem if we think it's useful to continue to have > a concept of pgrp for the sub-init that can see outside its own NS. > >> Neither should see si_pid 0, as from_ancestor_ns will not be true. > > Perhaps replace from_ancestor_ns with struct pid_namespace *sender_ns? > (I don't know if there was already a can of worms with such an idea before.) > Then si_pid could be translated as appropriate for each recipient. > (Or perhaps just struct pid *sender and reset si_pid from that.) The last was my original line of thinking. I seem to recall Oleg figuring the code gets pretty ugly when you add in the necessary test to see if si_pid is actually present. There are several other cases where we also signal a process outside of our current pid namespace, where we have a pid inside the recipients pid namespace. do_notify_parent is the easiest example. However those cases can get the value right because they are unicast signals and know their recipient when the set the si_pid originally. My current line of thinking is either: a) We pass in struct pid *sender and we reset si_pid in send_signal. b) We make the rule that send_signal must receive a valid siginfo from the caller and we only do the extra work for process groups. Eric