From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ram Pai Subject: Re: Building a clean namespace and MS_BIND across namespaces is now disabled Date: Mon, 21 Nov 2005 15:13:30 -0800 Message-ID: <1132614810.4788.29.camel@localhost> References: Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Al Viro , linux-fsdevel@vger.kernel.org, Miklos Szeredi , Christoph Hellwig , Jamie Lokier Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.144]:55990 "EHLO e4.ny.us.ibm.com") by vger.kernel.org with ESMTP id S932343AbVKUXMV (ORCPT ); Mon, 21 Nov 2005 18:12:21 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jALNCHeI022550 for ; Mon, 21 Nov 2005 18:12:17 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jALNCGZ8082984 for ; Mon, 21 Nov 2005 18:12:16 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jALNCG9I023349 for ; Mon, 21 Nov 2005 18:12:16 -0500 To: "Eric W. Biederman" In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sun, 2005-11-20 at 04:23, Eric W. Biederman wrote: > Currently I am looking at what it takes to build a namespace > from scratch. > > Intuitively I am thinking one of two forms: > > > pid = clone(..., CLONE_NEWNS, ...); > > if (pid == 0) { > > umount2("/", MNT_DETACH); > > mount(NULL, "/", "ramfs", 0, NULL); > > chdir("/"); > > chroot("/"); > > } I dont see why should this fail? when a new namespace is created the new tasks fs->root and fs->pwd are set appropriately to the corresponding mounts in the new namespace. Take a look at copy_namespace(). What am I missing? > > > root_fd = open("path", O_DIRECTORY | O_RDONLY); > > pid = clone(..., CONE_NEWNS, ...); > > if (pid == 0) { > > umount2("/", MNT_DETACH); > > fchdir(root_fd); > > mount(".", "/", NULL MS_BIND, NULL); > > chroot("."); > > } > > In practice the only form that seems to work is: > > > pid = clone(..., CLONE_NEWNS, ...); > > if (pid == 0) { > > chdir("path"); > > mount(".", ".", NULL, MS_BIND, NULL); > > chdir("path"); > > mount(".", "/", NULL, MS_MOVE, NULL); > > chroot("."); > > } > > Both of the failing forms fail miserably because while MNT_DETACH > works fine afterwords current->fs->pwd and current->fs->root > both point to directories that are no longer part of a namespace, > so check_mnt fails. In addition there appears to be no way to > set current->fs->pwd or current->fs->root to a valid directory > in the current namespace afterwards. I guess your requirement is: 1) create a new namespace 2) get rid of all the mounts in the new namespace 3) stitch new mounts in the new namespace selectively using the once from the old namespace. Right? step (1) and (2) can be done with the new 2.6.15* kernel. step (3) cannot be done because bind mount across namespaces has been invalidated. But if all you want is to selectively get rid of some mounts in the new namespace, why not just umount them? RP > > Without some form of unmounting all of the filesystems my > namespace is cluttered with all kinds of mounts I don't want > to see, and can never use. By walking through /proc/self/mounts I can > remove all but /. Even limiting the problem to a stack of mounts > on / if that stack gets deep enough it is still ugly and confusing > to look at. > > Like the umount case, mount(... "/") also does not > update current->fs->pwd and current->fs->root. The > latter can be worked around by using a temporary mount point > and using MS_MOVE, so the semantics I want are possible > but I still get a cluttered namespace with junk that is just > confusing to see. > > The least intrusive fix I can think of would be to add a MNT_DETACH > option to mount so I would be able to request that instead of stacking > mounts all underlying mounts at the given mount point would be > unmounted, as the mount is performed. > > ... > > This leads me to the second part of my puzzle. When you have > multiple namespaces around it can be handy to mount a filesystem > from a different namespace. Especially if you want to derive > your new namespace from an old one. > > In most versions of 2.6 this can be implemented by opening > a directory, and then when you want to mount it: > fchdir(dir_fd); > mount(".", "/some/path", NULL, MS_BIND, NULL); > > With the latest version of 2.6 this ability was removed in: > ccd48bc7fac284caf704dcdcafd223a24f70bccf > > Is there a correctness implication I am missing here? Since > you can fchdir to the directory it doesn't look like there are any > security implications. It looks like any correctness problems were > fixed in: 68b47139ea94ab6d05e89c654db8daa99e9a232c > > Eric