From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Serge E. Hallyn" Subject: Re: [PATCH 1/1] namespaces: introduce sys_hijack (v11) Date: Tue, 12 Aug 2008 12:06:58 -0500 Message-ID: <20080812170658.GA11641@us.ibm.com> References: <20080731183213.GA12033@us.ibm.com> <20080801092318.GA2002@wavehammer.waldi.eu.org> <20080801141152.GA11553@us.ibm.com> <20080801155148.GA16760@wavehammer.waldi.eu.org> <20080801163905.GA4647@us.ibm.com> <20080801171951.GA23754@wavehammer.waldi.eu.org> <20080801173817.GA21367@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20080801173817.GA21367-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Bastian Blank , Paul Menage , Pavel Emelyanov Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org List-Id: containers.vger.kernel.org Quoting Serge E. Hallyn (serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > Quoting Bastian Blank (bastian-yyjItF7Rl6lg9hUCZPvPmw@public.gmane.org): > > On Fri, Aug 01, 2008 at 11:39:05AM -0500, Serge E. Hallyn wrote: > > > Quoting Bastian Blank (bastian-yyjItF7Rl6lg9hUCZPvPmw@public.gmane.org): > > > > Why is it not enough to use the pid of the ns creator? The ns cgroups > > > > > > pids wrap around > > > > Ups, yes. > > > > > > But I think I have a different problem. Currently, namespaces are > > > > destructed if the last process using them exits. You change that, they > > > > will survive until the cgroup dies. Or is that cgroup destructed when > > > > there are no longer processes using the nsproxy? As the commit message > > > > speaks about "pid wraparound" as problem, I doubt that. > > > > > > Correct. Having the namespaces stick around, and being able to attach > > > to an empty container, was something Paul Menage had wanted IIRC. > > > > It may produce problems with pid namespaces. The namespace is cleared if > > the child reaper dies and I'm not sure how well it behaves without a new > > one, which you can't create. > > > > > But I'll leave that as is for now, until I hear something other than > > > "this is so wrong it isn't funny" from Pavel :) > > > > I'm not sure if it is funny to add another piece which may hold > > filesystems open. Currently we can have different namespaces. All of > > them are attached to processes and can be removed with kill. Now this > > code adds another copy to an (automatically created) cgroup. > > > > IMHO, the cgroup should be destructed automatically if the nsproxy is > > about to be die. > > I certainly don't think your caution is unwarranted. I like to keep the > refcounting in all of this as simple as possible. And as always those calling for caution are vindicated. It turns out I was grabbing a double-refcount on the nsproxy when a ns_cgroup is cloned. After fixing that, I get warnings about potential circular locking involving cgroup_mutex and namespace_sem. This is because cgroup_mutex depends on namespace_sem, but now doing rmdir on a once-filled ns_cgroup calls put_fs_struct(ns_cgroup->fs). But again, this patch was resent to solicit comment on the general approach. So I will put this patch aside again, unless I hear: 1. From Pavel, that he actually would like to use this approach for namespace entering. 2. From Paul, that he still has a need for entering empty cgroups. Otherwise, there is still the point of view (held I believe by Eric) that the right thing to do is provide the monitoring and control over containers that we need through proper namespace semantics and exported filesystems. thanks, -serge