linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Building a clean namespace and MS_BIND across namespaces is now disabled
@ 2005-11-20 12:23 Eric W. Biederman
  2005-11-20 23:04 ` Serge E. Hallyn
  2005-11-21 23:13 ` Ram Pai
  0 siblings, 2 replies; 4+ messages in thread
From: Eric W. Biederman @ 2005-11-20 12:23 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Ram Pai, Miklos Szeredi, Christoph Hellwig,
	Jamie Lokier


Currently I am looking at what it takes to build a namespace
from scratch.

Intuitively I am thinking one of two forms:

> pid = clone(..., CLONE_NEWNS, ...);
> if (pid == 0) {
> 	umount2("/", MNT_DETACH);
> 	mount(NULL, "/", "ramfs", 0, NULL);
> 	chdir("/");
> 	chroot("/");
> }

> root_fd = open("path", O_DIRECTORY | O_RDONLY);
> pid = clone(..., CONE_NEWNS, ...);
> if (pid == 0) {
> 	umount2("/", MNT_DETACH);
> 	fchdir(root_fd);
> 	mount(".", "/", NULL MS_BIND, NULL);
> 	chroot(".");
> }

In practice the only form that seems to work is:

> pid = clone(..., CLONE_NEWNS, ...);
> if (pid == 0) {
> 	chdir("path");
> 	mount(".", ".", NULL, MS_BIND, NULL);
> 	chdir("path");
> 	mount(".", "/", NULL, MS_MOVE, NULL);
> 	chroot(".");
> }

Both of the failing forms fail miserably because while MNT_DETACH
works fine afterwords current->fs->pwd and current->fs->root
both point to directories that are no longer part of a namespace,
so check_mnt fails.  In addition there appears to be no way to 
set current->fs->pwd or current->fs->root to a valid directory 
in the current namespace afterwards.

Without some form of unmounting all of the filesystems my
namespace is cluttered with all kinds of mounts I don't want
to see, and can never use. By walking through /proc/self/mounts I can
remove all but /.  Even limiting the problem to a stack of mounts
on / if that stack gets deep enough it is still ugly and confusing
to look at.

Like the umount case, mount(... "/") also does not 
update current->fs->pwd and current->fs->root.  The
latter can be worked around by using a temporary mount point
and using MS_MOVE, so the semantics I want are possible
but I still get a cluttered namespace with junk that is just
confusing to see.

The least intrusive fix I can think of would be to add a MNT_DETACH
option to mount so I would be able to request that instead of stacking
mounts all underlying mounts at the given mount point would be
unmounted, as the mount is performed. 

...

This leads me to the second part of my puzzle.  When you have
multiple namespaces around it can be handy to mount a filesystem
from a different namespace.  Especially if you want to derive
your new namespace from an old one.

In most versions of 2.6 this can be implemented by opening
a directory, and then when you want to mount it:
fchdir(dir_fd);
mount(".", "/some/path", NULL, MS_BIND, NULL);

With the latest version of 2.6 this ability was removed in:
ccd48bc7fac284caf704dcdcafd223a24f70bccf

Is there a correctness implication I am missing here?  Since
you can fchdir to the directory it doesn't look like there are any
security implications.  It looks like any correctness problems were
fixed in: 68b47139ea94ab6d05e89c654db8daa99e9a232c

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Building a clean namespace and MS_BIND across namespaces is now disabled
  2005-11-20 12:23 Building a clean namespace and MS_BIND across namespaces is now disabled Eric W. Biederman
@ 2005-11-20 23:04 ` Serge E. Hallyn
  2005-11-21  0:01   ` Eric W. Biederman
  2005-11-21 23:13 ` Ram Pai
  1 sibling, 1 reply; 4+ messages in thread
From: Serge E. Hallyn @ 2005-11-20 23:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Al Viro, linux-fsdevel, Ram Pai, Miklos Szeredi,
	Christoph Hellwig, Jamie Lokier

[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Currently I am looking at what it takes to build a namespace
> from scratch.
> 
> Intuitively I am thinking one of two forms:
> 
> > pid = clone(..., CLONE_NEWNS, ...);
> > if (pid == 0) {
> > 	umount2("/", MNT_DETACH);
> > 	mount(NULL, "/", "ramfs", 0, NULL);
> > 	chdir("/");
> > 	chroot("/");
> > }
> 
> > root_fd = open("path", O_DIRECTORY | O_RDONLY);
> > pid = clone(..., CONE_NEWNS, ...);
> > if (pid == 0) {
> > 	umount2("/", MNT_DETACH);
> > 	fchdir(root_fd);
> > 	mount(".", "/", NULL MS_BIND, NULL);
> > 	chroot(".");
> > }

Why not do

	do_clone_namespace
	mount -t ramfs none /build
	# set up a fs under /build
	mkdir /build/oldmount
	cd /build
	pivot_root . oldmount
	umount -l oldmount

?  That's what I used to do in chroot_ns.c (attached) for bsdjail
(www.sf.net/projects/linuxjail).

Hope I didn't completely misunderstand your question...

> This leads me to the second part of my puzzle.  When you have
> multiple namespaces around it can be handy to mount a filesystem
> from a different namespace.  Especially if you want to derive
> your new namespace from an old one.

Again, is it acceptable to do this ahead of time before doing
pivot_root (but after cloning the namespace)?

-serge

[-- Attachment #2: chroot_ns.c --]
[-- Type: text/x-csrc, Size: 3637 bytes --]

/* 
 * chroot_ns.c
 * Author: Serge Hallyn <serue@us.ibm.com>
 * Date: Jan 25, 2005
 *
 * This version acts as "chroot" using namespaces.
 *
 * Usage:
 * 	chroot_ns -u /mnt/d6 mnt
 * This will create a new filesystem namespace, make /mnt/d6 the root
 * of the filesystem, place the old root under /mnt and immediately
 * unmount it, then run /bin/sh in the new filesystem.
 *
 * Note that pivot_root requires the new root to be under a different
 * vfsmount.  If you get the following error:
 *   pivot_root: Device or resource busy
 * then try the following command first:
 *
 *   mount --bind <newroot> <newroot>
 *
 * Now you should be able to call chroot_ns <newroot>.
 *
 * Copyright (C) 2004 International Business Machines <serue@us.ibm.com>
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 */

#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <linux/unistd.h>
#include <sys/syscall.h>
#include <sys/mount.h>

#ifndef CLONE_NEWNS
#define CLONE_NEWNS 0x00020000
#endif

#ifndef MNT_DETACH
#define MNT_DETACH  0x00000002
#endif

#define MAX_PATH 256

static inline _syscall2(int, clone, int, flags, int, foo)
static _syscall2(int,pivot_root,const char *,new_root,const char *,put_old)


void usage(char *cmd)
{
	printf("Usage: %s [-u] <new_root> [<old_root>] [<command>]\n", cmd);
	printf("   Perform <command> under a new namespace with <new_root>\n");
	printf("   as the root of the filesystem.\n");
	printf("   If -u is specified, the old root will be unmounted before"
			" <command> is executed.\n");
	printf("   <old_root> is relative to the old root.");
	printf("   If unspecified, <old_root> is '/mnt'.\n");
	printf("   If unspecified, <command> is '/bin/sh'.\n");
	exit(-EINVAL);
}

#define OLD_ROOT "mnt"
#define CMD "/bin/sh"
int main(int argc, char *argv[])
{
	int pid = clone(CLONE_NEWNS | SIGCHLD,0);
	int ret;
	char *new_root, *old_root, *cmd, *argv0;
	char full_oldroot[MAX_PATH];
	int do_umount;


	if (pid == -1) {
		fprintf(stderr, "Permission denied on clone.\n");
		fprintf(stderr, "You must have CAP_SYS_ADMIN to clone a"
			" fs namespace.\n");
		exit(-1);
	}

	if (pid != 0) {
		waitpid(pid, &ret, 0);
		exit(-1);
	}

	argv0 = argv[0];
	if (argc > 1 && strcmp(argv[1], "-u") == 0) {
		do_umount = 1;
		argv++;
		argc--;
	} else
		do_umount = 0;

	if (argc < 2 || strcmp(argv[1], "-h") == 0)
		usage(argv0);

	new_root = argv[1];

	if (argc > 2)
		old_root = argv[2];
	else
		old_root = OLD_ROOT;

	if (argc > 3)
		cmd = argv[3];
	else
		cmd = CMD;

	if (strlen(old_root) + strlen(new_root) >= MAX_PATH-1) {
		printf("paths too long.\n");
		return -1;
	}

	snprintf(full_oldroot, MAX_PATH, "%s/%s", new_root, old_root);

	/* jump into the new root directory */
	printf("going into %s\n", new_root);
	ret = chdir(new_root);
	if (ret) {
		perror("chdir");
		exit(2);
	}

	/* pivot root */
	printf("switching %s and %s\n", new_root, full_oldroot);
	ret = pivot_root(new_root, full_oldroot);
	if (ret) {
		perror("pivot_root");
		printf("Try \"mount --bind %s %s\"\n", new_root, new_root);
		exit(ret);
	}

	/* unmount if requested */
	if (do_umount) {
		ret = umount2(old_root, MNT_DETACH);
		if (ret) {
			perror("umount");
			exit(2);
		}
	}

	/* Execute the command */
	execl(cmd, cmd, NULL);
	perror("execl");
	fprintf(stderr, "Cannot exec %s.\n", cmd);
	exit(-1);
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Building a clean namespace and MS_BIND across namespaces is now disabled
  2005-11-20 23:04 ` Serge E. Hallyn
@ 2005-11-21  0:01   ` Eric W. Biederman
  0 siblings, 0 replies; 4+ messages in thread
From: Eric W. Biederman @ 2005-11-21  0:01 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Al Viro, linux-fsdevel, Ram Pai, Miklos Szeredi,
	Christoph Hellwig, Jamie Lokier

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
> Why not do
>
> 	do_clone_namespace
> 	mount -t ramfs none /build
> 	# set up a fs under /build
> 	mkdir /build/oldmount
> 	cd /build
> 	pivot_root . oldmount
> 	umount -l oldmount
>
> ?  That's what I used to do in chroot_ns.c (attached) for bsdjail
> (www.sf.net/projects/linuxjail).
>
> Hope I didn't completely misunderstand your question...

Pivot root is probably a functional solution.  Part of
the reason for the question was I that I recall Al Viro
making assertions to the effect that pivot_root and friends
should no longer be necessary.

Part of the question is that doing what I described you can
get yourself into an interesting pickle, and having a clear
reading if it was don't do it then kind of scenario.

In addition there is a pain with pivot_root that you may
need at least a mount point on the original filesystem, and coordinating
all of those pieces may be unnecessarily complex.

>> This leads me to the second part of my puzzle.  When you have
>> multiple namespaces around it can be handy to mount a filesystem
>> from a different namespace.  Especially if you want to derive
>> your new namespace from an old one.
>
> Again, is it acceptable to do this ahead of time before doing
> pivot_root (but after cloning the namespace)?

In general you can't because you are not necessarily descended
from a namespace you want to bind mount a filesystem from.
As an example using mount bind across namespaces you can fairly easily
match the whole shared subtree thing that is being implemented now but
with userspace coordinating the sharing instead of the kernel. 

> #define MAX_PATH 256

You should be able to get MAX_PATH out of an appropriate header.
As I recall on linux this value is generally 4096 at least for
the kernel limit.

> static inline _syscall2(int, clone, int, flags, int, foo)
> static _syscall2(int,pivot_root,const char *,new_root,const char *,put_old)

You probably want to make these wrappers around the libc syscall()
function.  I don't remember the specifics but I recall there being
some portability limitations to using the _syscall macros.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Building a clean namespace and MS_BIND across namespaces is now disabled
  2005-11-20 12:23 Building a clean namespace and MS_BIND across namespaces is now disabled Eric W. Biederman
  2005-11-20 23:04 ` Serge E. Hallyn
@ 2005-11-21 23:13 ` Ram Pai
  1 sibling, 0 replies; 4+ messages in thread
From: Ram Pai @ 2005-11-21 23:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Christoph Hellwig,
	Jamie Lokier

On Sun, 2005-11-20 at 04:23, Eric W. Biederman wrote:
> Currently I am looking at what it takes to build a namespace
> from scratch.
> 
> Intuitively I am thinking one of two forms:
> 
> > pid = clone(..., CLONE_NEWNS, ...);
> > if (pid == 0) {
> > 	umount2("/", MNT_DETACH);
> > 	mount(NULL, "/", "ramfs", 0, NULL);
> > 	chdir("/");
> > 	chroot("/");
> > }

I dont see why should this fail?
when a new namespace is created the new tasks fs->root and fs->pwd
are set appropriately to the corresponding mounts in the new namespace.
Take a look at copy_namespace(). What am I missing?

> 
> > root_fd = open("path", O_DIRECTORY | O_RDONLY);
> > pid = clone(..., CONE_NEWNS, ...);
> > if (pid == 0) {
> > 	umount2("/", MNT_DETACH);
> > 	fchdir(root_fd);
> > 	mount(".", "/", NULL MS_BIND, NULL);
> > 	chroot(".");
> > }
> 
> In practice the only form that seems to work is:
> 
> > pid = clone(..., CLONE_NEWNS, ...);
> > if (pid == 0) {
> > 	chdir("path");
> > 	mount(".", ".", NULL, MS_BIND, NULL);
> > 	chdir("path");
> > 	mount(".", "/", NULL, MS_MOVE, NULL);
> > 	chroot(".");
> > }
> 
> Both of the failing forms fail miserably because while MNT_DETACH
> works fine afterwords current->fs->pwd and current->fs->root
> both point to directories that are no longer part of a namespace,
> so check_mnt fails.  In addition there appears to be no way to 
> set current->fs->pwd or current->fs->root to a valid directory 
> in the current namespace afterwards.

I guess your requirement is:
  1) create a new namespace
  2) get rid of all the mounts in the new namespace
  3) stitch new mounts in the new namespace selectively using the 
         once from the old namespace.

Right?  step (1) and (2) can be done with the new 2.6.15* kernel. 
step (3) cannot be done because bind mount across namespaces has been
invalidated.  But if all you want is to selectively get rid of some
mounts in the new namespace, why not just umount them?

RP






> 
> Without some form of unmounting all of the filesystems my
> namespace is cluttered with all kinds of mounts I don't want
> to see, and can never use. By walking through /proc/self/mounts I can
> remove all but /.  Even limiting the problem to a stack of mounts
> on / if that stack gets deep enough it is still ugly and confusing
> to look at.
> 
> Like the umount case, mount(... "/") also does not 
> update current->fs->pwd and current->fs->root.  The
> latter can be worked around by using a temporary mount point
> and using MS_MOVE, so the semantics I want are possible
> but I still get a cluttered namespace with junk that is just
> confusing to see.
> 
> The least intrusive fix I can think of would be to add a MNT_DETACH
> option to mount so I would be able to request that instead of stacking
> mounts all underlying mounts at the given mount point would be
> unmounted, as the mount is performed. 
> 
> ...
> 
> This leads me to the second part of my puzzle.  When you have
> multiple namespaces around it can be handy to mount a filesystem
> from a different namespace.  Especially if you want to derive
> your new namespace from an old one.
> 
> In most versions of 2.6 this can be implemented by opening
> a directory, and then when you want to mount it:
> fchdir(dir_fd);
> mount(".", "/some/path", NULL, MS_BIND, NULL);
> 
> With the latest version of 2.6 this ability was removed in:
> ccd48bc7fac284caf704dcdcafd223a24f70bccf
> 
> Is there a correctness implication I am missing here?  Since
> you can fchdir to the directory it doesn't look like there are any
> security implications.  It looks like any correctness problems were
> fixed in: 68b47139ea94ab6d05e89c654db8daa99e9a232c
> 
> Eric


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-11-21 23:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-20 12:23 Building a clean namespace and MS_BIND across namespaces is now disabled Eric W. Biederman
2005-11-20 23:04 ` Serge E. Hallyn
2005-11-21  0:01   ` Eric W. Biederman
2005-11-21 23:13 ` Ram Pai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).