* [RFC][PATCH] rbind across namespaces
@ 2005-05-20 22:11 Ram
2005-05-21 6:27 ` Miklos Szeredi
0 siblings, 1 reply; 36+ messages in thread
From: Ram @ 2005-05-20 22:11 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel; +Cc: Andrew Morton, viro, Miklos Szeredi, jamie
[-- Attachment #1: Type: text/plain, Size: 773 bytes --]
I have enclosed a patch that allows rbinds across any two namespaces.
NOTE: currenly bind from foriegn namespace to current namespace is
allowed. This patch now allows:
binds/rbinds from any namespace to any other namespace, under the
assumption that if a process has access to a namespace, it ought to
have permission to manipulate that namespace.
The patch incorporates ideas from Miklos and Jamie, and is dependent
on Miklos's 'fix race in mark_mounts_for_expiry' patch to function
correctly. Also it depends on Miklos's 'fix bind mount from foreign
namespace' patch, because without that patch umounts would fail.
Though we have not come up with any security reason towards why
this functionality should not be allowed, I am sure it may open
up some concerns.
RP
[-- Attachment #2: rbind_across_namespace.patch --]
[-- Type: text/x-patch, Size: 2616 bytes --]
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
--- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700
+++ 2.6.12-rc4/fs/namespace.c 2005-05-20 14:44:57.000000000 -0700
@@ -616,11 +616,15 @@ out_unlock:
}
/*
- * do loopback mount.
+ * do loopback mount. The loopback mount can be done from any namespace
+ * to any other namespace including the current namespace, as long as
+ * the task acquired rights to manipulate them.
*/
static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
{
struct nameidata old_nd;
+ struct namespace *mntpt_ns = nd->mnt->mnt_namespace, *old_ns;
+ int mntpt_ns_flag=0, old_ns_flag=0;
struct vfsmount *mnt = NULL;
int err = mount_is_safe(nd);
if (err)
@@ -631,16 +635,54 @@ static int do_loopback(struct nameidata
if (err)
return err;
- down_write(¤t->namespace->sem);
+ old_ns = old_nd.mnt->mnt_namespace;
+
+ /*
+ * make sure the namespaces do not disapper while
+ * we operate on it
+ */
err = -EINVAL;
- if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {
- err = -ENOMEM;
- if (recurse)
- mnt = copy_tree(old_nd.mnt, old_nd.dentry);
- else
- mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
+ if (mntpt_ns != current->namespace) {
+ spin_lock(&vfsmount_lock);
+ if (!mntpt_ns->root) {
+ spin_unlock(&vfsmount_lock);
+ goto out;
+ }
+ get_namespace(mntpt_ns);
+ spin_unlock(&vfsmount_lock);
+ mntpt_ns_flag=1;
}
+ if (old_ns != current->namespace) {
+ spin_lock(&vfsmount_lock);
+ if (!old_ns->root) {
+ spin_unlock(&vfsmount_lock);
+ goto release_mntpt_ns;
+ }
+ get_namespace(old_ns);
+ spin_unlock(&vfsmount_lock);
+ old_ns_flag=1;
+ }
+
+ /*
+ * make sure we don't race with some
+ * other thread manipulating the
+ * namespaces.
+ */
+ if (old_ns < mntpt_ns) {
+ down_write(&old_ns->sem);
+ }
+ down_write(&mntpt_ns->sem);
+ if (old_ns > mntpt_ns) {
+ down_write(&old_ns->sem);
+ }
+
+ err = -ENOMEM;
+ if (recurse)
+ mnt = copy_tree(old_nd.mnt, old_nd.dentry);
+ else
+ mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
+
if (mnt) {
/* stop bind mounts from expiring */
spin_lock(&vfsmount_lock);
@@ -656,7 +698,23 @@ static int do_loopback(struct nameidata
mntput(mnt);
}
- up_write(¤t->namespace->sem);
+ if (old_ns < mntpt_ns) {
+ up_write(&old_ns->sem);
+ }
+ up_write(&mntpt_ns->sem);
+ if (old_ns > mntpt_ns) {
+ up_write(&old_ns->sem);
+ }
+
+ if (old_ns_flag) {
+ put_namespace(old_ns);
+ }
+
+release_mntpt_ns:
+ if (mntpt_ns_flag) {
+ put_namespace(mntpt_ns);
+ }
+out:
path_release(&old_nd);
return err;
}
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [RFC][PATCH] rbind across namespaces 2005-05-20 22:11 [RFC][PATCH] rbind across namespaces Ram @ 2005-05-21 6:27 ` Miklos Szeredi 2005-05-21 7:26 ` Ram 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-21 6:27 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > I have enclosed a patch that allows rbinds across any two namespaces. > NOTE: currenly bind from foriegn namespace to current namespace is > allowed. This patch now allows: Note: you are accessing ->mnt_namespace without holding vfsmount_lock. Also why check current->namespace? It doesn't hurt to do get_namespace() even if it's not strictly needed. And it would simplify the code. In fact all uses of current->namespace and check_mnt() could be eliminated from namespace.c. The only use of ->namespace would be in copy_namespace() and exit_namespace(). Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 6:27 ` Miklos Szeredi @ 2005-05-21 7:26 ` Ram 2005-05-21 8:09 ` Miklos Szeredi 2005-05-21 13:43 ` Jamie Lokier 0 siblings, 2 replies; 36+ messages in thread From: Ram @ 2005-05-21 7:26 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie [-- Attachment #1: Type: text/plain, Size: 1022 bytes --] On Fri, 2005-05-20 at 23:27, Miklos Szeredi wrote: > > I have enclosed a patch that allows rbinds across any two namespaces. > > NOTE: currenly bind from foriegn namespace to current namespace is > > allowed. This patch now allows: > > Note: you are accessing ->mnt_namespace without holding vfsmount_lock. I realized that just after sending the patch :( . Have corrected it in the new patch. > > Also why check current->namespace? It doesn't hurt to do > get_namespace() even if it's not strictly needed. And it would > simplify the code. however get_namespace() will chew up some extra cycles sometimes unnecessarily. I did incorporate your comment and got much simpler code. > > In fact all uses of current->namespace and check_mnt() could be > eliminated from namespace.c. The only use of ->namespace would be in > copy_namespace() and exit_namespace(). Yes. I will make this a separate patch, which I will send out soon. Will have to look at each of the cases deeply. Enclosed the simplified patch, RP [-- Attachment #2: rbind_across_namespace1.patch --] [-- Type: text/x-patch, Size: 2797 bytes --] Summary: Allows rbinds across any two namespaces. NOTE: currenly bind from foriegn namespace to current namespace is allowed. This patch now allows: binds/rbinds from any namespace to any other namespace, under the assumption that if a process has access to a namespace, it ought to have permission to manipulate that namespace. The patch incorporates ideas from Miklos and Jamie, and is dependent on Miklos's "fix race in mark_mounts_for_expiry" patch to function correctly. Also it depends on Miklos's "fix bind mount from foreign namespace" patch, because without that patch umounts would fail. Signed off by Ram Pai <linuxram@us.ibm.com> --- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700 +++ linux-2.6.12-rc4/fs/namespace.c 2005-05-21 00:17:31.000000000 -0700 @@ -616,11 +616,14 @@ out_unlock: } /* - * do loopback mount. + * do loopback mount. The loopback mount can be done from any namespace + * to any other namespace including the current namespace, as long as + * the task acquired rights to manipulate them. */ static int do_loopback(struct nameidata *nd, char *old_name, int recurse) { struct nameidata old_nd; + struct namespace *mntpt_ns, *old_ns; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); if (err) @@ -631,16 +634,39 @@ static int do_loopback(struct nameidata if (err) return err; - down_write(¤t->namespace->sem); err = -EINVAL; - if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) { - err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry); - else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + spin_lock(&vfsmount_lock); + + mntpt_ns = nd->mnt->mnt_namespace; + old_ns = old_nd.mnt->mnt_namespace; + if (!mntpt_ns || !mntpt_ns->root || !old_ns || !old_ns->root) { + spin_unlock(&vfsmount_lock); + goto out; + } + get_namespace(mntpt_ns); + get_namespace(old_ns); + + spin_unlock(&vfsmount_lock); + + /* + * make sure we don't race with some + * other thread manipulating the + * namespaces. + */ + if (old_ns < mntpt_ns) { + down_write(&old_ns->sem); + } + down_write(&mntpt_ns->sem); + if (old_ns > mntpt_ns) { + down_write(&old_ns->sem); } + err = -ENOMEM; + if (recurse) + mnt = copy_tree(old_nd.mnt, old_nd.dentry); + else + mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + if (mnt) { /* stop bind mounts from expiring */ spin_lock(&vfsmount_lock); @@ -656,7 +682,17 @@ static int do_loopback(struct nameidata mntput(mnt); } - up_write(¤t->namespace->sem); + if (old_ns < mntpt_ns) { + up_write(&old_ns->sem); + } + up_write(&mntpt_ns->sem); + if (old_ns > mntpt_ns) { + up_write(&old_ns->sem); + } + + put_namespace(old_ns); + put_namespace(mntpt_ns); +out: path_release(&old_nd); return err; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 7:26 ` Ram @ 2005-05-21 8:09 ` Miklos Szeredi 2005-05-21 8:45 ` Ram 2005-05-21 13:46 ` Jamie Lokier 2005-05-21 13:43 ` Jamie Lokier 1 sibling, 2 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-21 8:09 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > Enclosed the simplified patch, Looks much better :) I still see a problem: what if old_nd.mnt is already detached, and bind is non-recursive. Now it fails with EINVAL, though it used to work (and I think is very useful). When doing up_write(...) you don't have to keep the order, just check if the namespaces are not equal for the second up_write(). And why don't you do this: if (old_ns < mntpt_ns) down_write(&old_ns->sem); instead of this if (old_ns < mntpt_ns) { down_write(&old_ns->sem); } Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 8:09 ` Miklos Szeredi @ 2005-05-21 8:45 ` Ram 2005-05-21 9:09 ` Miklos Szeredi 2005-05-21 9:48 ` Miklos Szeredi 2005-05-21 13:46 ` Jamie Lokier 1 sibling, 2 replies; 36+ messages in thread From: Ram @ 2005-05-21 8:45 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie On Sat, 2005-05-21 at 01:09, Miklos Szeredi wrote: > > Enclosed the simplified patch, > > Looks much better :) > > I still see a problem: what if old_nd.mnt is already detached, and > bind is non-recursive. Now it fails with EINVAL, though it used to > work (and I think is very useful). I am not getting this comment. R u assuming that a detached mount will have NULL namespace? If so I dont see it being the case. Or am I missing some subtle point? > When doing up_write(...) you don't have to keep the order, just check > if the namespaces are not equal for the second up_write(). Yes. saves atleast 2 lines. > > And why don't you do this: > > if (old_ns < mntpt_ns) > down_write(&old_ns->sem); > > instead of this > > if (old_ns < mntpt_ns) { > down_write(&old_ns->sem); > } Will do. Saves another few lines. :) I will send out the new patch once I understand your first comment. RP > > Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 8:45 ` Ram @ 2005-05-21 9:09 ` Miklos Szeredi 2005-05-21 10:07 ` Ram 2005-05-21 9:48 ` Miklos Szeredi 1 sibling, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-21 9:09 UTC (permalink / raw) To: linuxram; +Cc: miklos, linux-kernel, linux-fsdevel, akpm, viro, jamie > > I still see a problem: what if old_nd.mnt is already detached, and > > bind is non-recursive. Now it fails with EINVAL, though it used to > > work (and I think is very useful). > > I am not getting this comment. R u assuming that a detached mount > will have NULL namespace? Yes. > If so I dont see it being the case. See this patch: http://marc.theaimsgroup.com/?l=linux-kernel&m=111627383207049&w=2 Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 9:09 ` Miklos Szeredi @ 2005-05-21 10:07 ` Ram 2005-05-21 13:12 ` Miklos Szeredi 0 siblings, 1 reply; 36+ messages in thread From: Ram @ 2005-05-21 10:07 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie [-- Attachment #1: Type: text/plain, Size: 648 bytes --] On Sat, 2005-05-21 at 02:09, Miklos Szeredi wrote: > > > I still see a problem: what if old_nd.mnt is already detached, and > > > bind is non-recursive. Now it fails with EINVAL, though it used to > > > work (and I think is very useful). > > > > I am not getting this comment. R u assuming that a detached mount > > will have NULL namespace? > > Yes. > > > If so I dont see it being the case. > > See this patch: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=111627383207049&w=2 Ok. look at the enclosed patch. Does it look any better? The special casing for detached mounts ate up some brain cells and made the code less simpler. RP [-- Attachment #2: rbind_across_namespace2.patch --] [-- Type: text/x-patch, Size: 2955 bytes --] Summary: Allows rbinds across any two namespaces. NOTE: currenly bind from foriegn namespace to current namespace is allowed. This patch now allows: binds/rbinds from any namespace to any other namespace, under the assumption that if a process has access to a namespace, it ought to have permission to manipulate that namespace. The only case disallowed is a recursive bind to a detached mount. The patch incorporates ideas from Miklos and Jamie, and is dependent on Miklos's "fix race in mark_mounts_for_expiry" patch to function correctly. Also it depends on Miklos's "fix bind mount from foreign namespace" patch, because without that patch umounts would fail. Signed off by Ram Pai <linuxram@us.ibm.com> --- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700 +++ linux-2.6.12-rc4/fs/namespace.c 2005-05-21 02:58:48.000000000 -0700 @@ -616,11 +616,14 @@ out_unlock: } /* - * do loopback mount. + * do loopback mount. The loopback mount can be done from any namespace + * to any other namespace including the current namespace, as long as + * the task acquired rights to manipulate them. */ static int do_loopback(struct nameidata *nd, char *old_name, int recurse) { struct nameidata old_nd; + struct namespace *mntpt_ns, *old_ns; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); if (err) @@ -631,15 +634,41 @@ static int do_loopback(struct nameidata if (err) return err; - down_write(¤t->namespace->sem); err = -EINVAL; - if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) { - err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry); - else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + spin_lock(&vfsmount_lock); + + /* + * Allow all kinds of binds and rbinds except + * recursive bind to a detached mount + */ + mntpt_ns = nd->mnt->mnt_namespace; + old_ns = old_nd.mnt->mnt_namespace; + if (!mntpt_ns || !mntpt_ns->root || + (recurse && (!old_ns || !old_ns->root))) { + spin_unlock(&vfsmount_lock); + goto out; } + get_namespace(mntpt_ns); + if (recurse && old_ns != mntpt_ns) + get_namespace(old_ns); + + spin_unlock(&vfsmount_lock); + + /* + * make sure we don't race with some other thread manipulating the + * namespaces. + */ + if (recurse && (old_ns < mntpt_ns)) + down_write(&old_ns->sem); + down_write(&mntpt_ns->sem); + if (recurse && (old_ns > mntpt_ns)) + down_write(&old_ns->sem); + + err = -ENOMEM; + if (recurse) + mnt = copy_tree(old_nd.mnt, old_nd.dentry); + else + mnt = clone_mnt(old_nd.mnt, old_nd.dentry); if (mnt) { /* stop bind mounts from expiring */ @@ -656,7 +685,14 @@ static int do_loopback(struct nameidata mntput(mnt); } - up_write(¤t->namespace->sem); + up_write(&mntpt_ns->sem); + if (recurse && (old_ns != mntpt_ns)) { + up_write(&old_ns->sem); + put_namespace(old_ns); + } + put_namespace(mntpt_ns); + +out: path_release(&old_nd); return err; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 10:07 ` Ram @ 2005-05-21 13:12 ` Miklos Szeredi 2005-05-22 20:25 ` Ram 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-21 13:12 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > Ok. look at the enclosed patch. Does it look any better? The special > casing for detached mounts ate up some brain cells and made the code > less simpler. Yes, this isn't trivial stuff. I realized one more thing: nd->mnt (the destination vfsmount) might be detached while waiting for the semaphore. So that needs to be rechecked after taking the semaphores. And the same for old_nd->mnt in case of rbind. Though I'm not sure what the semantics should be in this case: 1) rbind always fails if the source is detached 2) rbind always succeeds, and if the source is detached it just copies that single mount I like 2) better. Is there anything against it? Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 13:12 ` Miklos Szeredi @ 2005-05-22 20:25 ` Ram 2005-05-22 20:51 ` Ram 0 siblings, 1 reply; 36+ messages in thread From: Ram @ 2005-05-22 20:25 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie [-- Attachment #1: Type: text/plain, Size: 1184 bytes --] On Sat, 2005-05-21 at 06:12, Miklos Szeredi wrote: > > Ok. look at the enclosed patch. Does it look any better? The special > > casing for detached mounts ate up some brain cells and made the code > > less simpler. > > Yes, this isn't trivial stuff. > > I realized one more thing: nd->mnt (the destination vfsmount) might be > detached while waiting for the semaphore. So that needs to be > rechecked after taking the semaphores. Ok. fixed that. that was surprisingly trivial though initially it looked like some complex locking. > > And the same for old_nd->mnt in case of rbind. Though I'm > not sure what the semantics should be in this case: > > 1) rbind always fails if the source is detached > 2) rbind always succeeds, and if the source is detached it just > copies that single mount > I like 2) better. Is there anything against it? sure. as much functionality as we can get. I have incorporated (2). Take a look at the enclosed patch, RP > > Miklos > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: rbind_across_namespace3.patch --] [-- Type: text/x-patch, Size: 2589 bytes --] --- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700 +++ linux-2.6.12-rc4/fs/namespace.c 2005-05-22 13:22:46.000000000 -0700 @@ -616,11 +616,14 @@ out_unlock: } /* - * do loopback mount. + * do loopback mount. The loopback mount can be done from any namespace + * to any other namespace including the current namespace, as long as + * the task acquired rights to manipulate them. */ static int do_loopback(struct nameidata *nd, char *old_name, int recurse) { struct nameidata old_nd; + struct namespace *mntpt_ns, *old_ns; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); if (err) @@ -631,15 +634,54 @@ static int do_loopback(struct nameidata if (err) return err; - down_write(¤t->namespace->sem); err = -EINVAL; - if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) { - err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry); - else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + spin_lock(&vfsmount_lock); + + /* + * Allow all kinds of binds and rbinds except + * recursive bind to a detached mount + */ + mntpt_ns = nd->mnt->mnt_namespace; + old_ns = old_nd.mnt->mnt_namespace; + if (!mntpt_ns || !mntpt_ns->root || + (recurse && (!old_ns || !old_ns->root))) { + spin_unlock(&vfsmount_lock); + goto out; } + get_namespace(mntpt_ns); + if (recurse && old_ns != mntpt_ns) + get_namespace(old_ns); + + spin_unlock(&vfsmount_lock); + + /* + * make sure we don't race with some other thread manipulating the + * namespaces. + */ + if (recurse && (old_ns < mntpt_ns)) + down_write(&old_ns->sem); + down_write(&mntpt_ns->sem); + if (recurse && (old_ns > mntpt_ns)) + down_write(&old_ns->sem); + + + /* + * well... mounts might have detached while we acquired + * the semaphores. Revalidate that the destination mount + * is still attached. + */ + if (!nd->mnt->mnt_namespace) + goto error_out; + + /* + * and if the source mount is detached, than just do + * a bind, instead of a rbind + */ + err = -ENOMEM; + if (recurse && old_nd.mnt->mnt_namespace) + mnt = copy_tree(old_nd.mnt, old_nd.dentry); + else + mnt = clone_mnt(old_nd.mnt, old_nd.dentry); if (mnt) { /* stop bind mounts from expiring */ @@ -656,7 +698,15 @@ static int do_loopback(struct nameidata mntput(mnt); } - up_write(¤t->namespace->sem); +error_out: + up_write(&mntpt_ns->sem); + if (recurse && (old_ns != mntpt_ns)) { + up_write(&old_ns->sem); + put_namespace(old_ns); + } + put_namespace(mntpt_ns); + +out: path_release(&old_nd); return err; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-22 20:25 ` Ram @ 2005-05-22 20:51 ` Ram 2005-05-23 5:08 ` Miklos Szeredi 0 siblings, 1 reply; 36+ messages in thread From: Ram @ 2005-05-22 20:51 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie [-- Attachment #1: Type: text/plain, Size: 1405 bytes --] On Sun, 2005-05-22 at 13:25, Ram wrote: > On Sat, 2005-05-21 at 06:12, Miklos Szeredi wrote: > > > Ok. look at the enclosed patch. Does it look any better? The special > > > casing for detached mounts ate up some brain cells and made the code > > > less simpler. > > > > Yes, this isn't trivial stuff. > > > > I realized one more thing: nd->mnt (the destination vfsmount) might be > > detached while waiting for the semaphore. So that needs to be > > rechecked after taking the semaphores. > > Ok. fixed that. that was surprisingly trivial though initially it looked > like some complex locking. > > > > > And the same for old_nd->mnt in case of rbind. Though I'm > > not sure what the semantics should be in this case: > > > > 1) rbind always fails if the source is detached > > 2) rbind always succeeds, and if the source is detached it just > > copies that single mount > > > I like 2) better. Is there anything against it? > > sure. as much functionality as we can get. I have incorporated (2). > > Take a look at the enclosed patch, The patch failed rbinds in some cases. Fixed it. The enclosed patch has a high chance of being bug free. RP > RP > > > > Miklos > > - > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: rbind_across_namespace4.patch --] [-- Type: text/x-patch, Size: 3338 bytes --] Summary: Allows rbinds across any two namespaces. NOTE: currenly bind from foriegn namespace to current namespace is allowed. This patch now allows: binds/rbinds from any namespace to any other namespace, under the assumption that if a process has access to a namespace, it ought to have permission to manipulate that namespace. The only case disallowed is binds/rbinds to a detached vfsmount. The patch incorporates ideas from Miklos and Jamie, and is dependent on Miklos's "fix race in mark_mounts_for_expiry" patch to function correctly. Also it depends on Miklos's "fix bind mount from foreign namespace" patch, because without that patch umounts would fail. Signed off by Ram Pai <linuxram@us.ibm.com> --- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700 +++ linux-2.6.12-rc4/fs/namespace.c 2005-05-22 13:40:12.000000000 -0700 @@ -616,11 +616,14 @@ out_unlock: } /* - * do loopback mount. + * do loopback mount. The loopback mount can be done from any namespace + * to any other namespace including the current namespace, as long as + * the task acquired rights to manipulate them. */ static int do_loopback(struct nameidata *nd, char *old_name, int recurse) { struct nameidata old_nd; + struct namespace *mntpt_ns, *old_ns; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); if (err) @@ -631,16 +634,58 @@ static int do_loopback(struct nameidata if (err) return err; - down_write(¤t->namespace->sem); err = -EINVAL; - if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) { - err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry); - else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + spin_lock(&vfsmount_lock); + + /* + * Allow all kinds of binds and rbinds except + * recursive bind to a detached mount + */ + mntpt_ns = nd->mnt->mnt_namespace; + old_ns = old_nd.mnt->mnt_namespace; + if (!mntpt_ns || !mntpt_ns->root) { + spin_unlock(&vfsmount_lock); + goto out; } + if (!old_ns || !old_ns->root) + recurse = 0; + + get_namespace(mntpt_ns); + if (recurse && old_ns != mntpt_ns) + get_namespace(old_ns); + + spin_unlock(&vfsmount_lock); + + /* + * make sure we don't race with some other thread manipulating the + * namespaces. + */ + if (recurse && (old_ns < mntpt_ns)) + down_write(&old_ns->sem); + down_write(&mntpt_ns->sem); + if (recurse && (old_ns > mntpt_ns)) + down_write(&old_ns->sem); + + + /* + * well... mounts might have detached while we acquired + * the semaphores. Revalidate that the destination mount + * is still attached. + */ + if (!nd->mnt->mnt_namespace) + goto error_out; + + /* + * and if the source mount is detached, than just do + * a bind, instead of a rbind + */ + err = -ENOMEM; + if (recurse && old_nd.mnt->mnt_namespace) + mnt = copy_tree(old_nd.mnt, old_nd.dentry); + else + mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + if (mnt) { /* stop bind mounts from expiring */ spin_lock(&vfsmount_lock); @@ -656,7 +701,15 @@ static int do_loopback(struct nameidata mntput(mnt); } - up_write(¤t->namespace->sem); +error_out: + up_write(&mntpt_ns->sem); + if (recurse && (old_ns != mntpt_ns)) { + up_write(&old_ns->sem); + put_namespace(old_ns); + } + put_namespace(mntpt_ns); + +out: path_release(&old_nd); return err; } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-22 20:51 ` Ram @ 2005-05-23 5:08 ` Miklos Szeredi 2005-05-23 7:24 ` Ram 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-23 5:08 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > The patch failed rbinds in some cases. Fixed it. The enclosed patch > has a high chance of being bug free. Except not setting mnt_namespace corretly, it does seem to be ok. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-23 5:08 ` Miklos Szeredi @ 2005-05-23 7:24 ` Ram 2005-05-23 8:24 ` Miklos Szeredi 0 siblings, 1 reply; 36+ messages in thread From: Ram @ 2005-05-23 7:24 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel, Andrew Morton, viro, jamie [-- Attachment #1: Type: text/plain, Size: 518 bytes --] On Sun, 2005-05-22 at 22:08, Miklos Szeredi wrote: > > The patch failed rbinds in some cases. Fixed it. The enclosed patch > > has a high chance of being bug free. > > Except not setting mnt_namespace corretly, it does seem to be ok. yes done. enclosed the patch. But seems like this patch opens up questions like: Should mounts/umounts/remounts/pivot_root in foreign namespaces be allowed? Probably check_mnt() can simply go, and the correct semaphores have to be held in each of the functions. RP > > Miklos [-- Attachment #2: rbind_across_namespace5.patch --] [-- Type: text/x-patch, Size: 5372 bytes --] Summary: Allows rbinds across any two namespaces. NOTE: currenly bind from foriegn namespace to current namespace is allowed. This patch now allows: 1. binds/rbinds from any namespace to any other namespace, under the assumption that if a process has access to a namespace, it ought to have permission to manipulate that namespace. 2. bind/rbind to detached mounts are disallowed. 3. rbind from detached mount is treated as bind. The patch incorporates ideas from Miklos and Jamie, and is dependent on Miklos's "fix race in mark_mounts_for_expiry" patch to function correctly. Also it depends on Miklos's "fix bind mount from foreign namespace" patch, because without that patch umounts would fail. --- /home/linux/views/linux-2.6.12-rc4/fs/namespace.c 2005-05-06 23:22:29.000000000 -0700 +++ linux-2.6.12-rc4/fs/namespace.c 2005-05-23 00:20:39.000000000 -0700 @@ -148,7 +148,7 @@ static struct vfsmount *next_mnt(struct } static struct vfsmount * -clone_mnt(struct vfsmount *old, struct dentry *root) +clone_mnt(struct vfsmount *old, struct dentry *root, struct namespace *ns) { struct super_block *sb = old->mnt_sb; struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname); @@ -160,7 +160,7 @@ clone_mnt(struct vfsmount *old, struct d mnt->mnt_root = dget(root); mnt->mnt_mountpoint = mnt->mnt_root; mnt->mnt_parent = mnt; - mnt->mnt_namespace = old->mnt_namespace; + mnt->mnt_namespace = ns; /* stick the duplicate mount on the same expiry list * as the original if that was on one */ @@ -533,13 +533,14 @@ lives_below_in_same_fs(struct dentry *d, } } -static struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry) +static struct vfsmount *copy_tree(struct vfsmount *mnt, struct dentry *dentry, + struct namespace *ns) { struct vfsmount *res, *p, *q, *r, *s; struct list_head *h; struct nameidata nd; - res = q = clone_mnt(mnt, dentry); + res = q = clone_mnt(mnt, dentry, ns); if (!q) goto Enomem; q->mnt_mountpoint = mnt->mnt_mountpoint; @@ -558,7 +559,7 @@ static struct vfsmount *copy_tree(struct p = s; nd.mnt = q; nd.dentry = p->mnt_mountpoint; - q = clone_mnt(p, p->mnt_root); + q = clone_mnt(p, p->mnt_root, ns); if (!q) goto Enomem; spin_lock(&vfsmount_lock); @@ -616,11 +617,14 @@ out_unlock: } /* - * do loopback mount. + * do loopback mount. The loopback mount can be done from any namespace to any + * other namespace including the current namespace, as long as the task acquired + * rights to manipulate them. */ static int do_loopback(struct nameidata *nd, char *old_name, int recurse) { struct nameidata old_nd; + struct namespace *mntpt_ns, *old_ns; struct vfsmount *mnt = NULL; int err = mount_is_safe(nd); if (err) @@ -631,16 +635,58 @@ static int do_loopback(struct nameidata if (err) return err; - down_write(¤t->namespace->sem); err = -EINVAL; - if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) { - err = -ENOMEM; - if (recurse) - mnt = copy_tree(old_nd.mnt, old_nd.dentry); - else - mnt = clone_mnt(old_nd.mnt, old_nd.dentry); + spin_lock(&vfsmount_lock); + + /* + * Disallow bind/rbind mounts to detached mounts and treat rbind from + * detached mount as normal bind. + */ + mntpt_ns = nd->mnt->mnt_namespace; + old_ns = old_nd.mnt->mnt_namespace; + if (!mntpt_ns || !mntpt_ns->root) { + spin_unlock(&vfsmount_lock); + goto out; } + if (!old_ns || !old_ns->root) + recurse = 0; + + get_namespace(mntpt_ns); + if (recurse && old_ns != mntpt_ns) + get_namespace(old_ns); + + spin_unlock(&vfsmount_lock); + + /* + * make sure we don't race with some other thread manipulating the + * namespaces. + */ + if (recurse && (old_ns < mntpt_ns)) + down_write(&old_ns->sem); + down_write(&mntpt_ns->sem); + if (recurse && (old_ns > mntpt_ns)) + down_write(&old_ns->sem); + + + /* + * well... mounts might have detached while we acquired + * the semaphores. Revalidate that the destination mount + * is still attached. + */ + if (!nd->mnt->mnt_namespace) + goto error_out; + + /* + * and if the source mount is detached, than just do + * a bind, instead of a rbind + */ + err = -ENOMEM; + if (recurse && old_nd.mnt->mnt_namespace) + mnt = copy_tree(old_nd.mnt, old_nd.dentry, mntpt_ns); + else + mnt = clone_mnt(old_nd.mnt, old_nd.dentry, mntpt_ns); + if (mnt) { /* stop bind mounts from expiring */ spin_lock(&vfsmount_lock); @@ -656,7 +702,15 @@ static int do_loopback(struct nameidata mntput(mnt); } - up_write(¤t->namespace->sem); +error_out: + up_write(&mntpt_ns->sem); + if (recurse && (old_ns != mntpt_ns)) { + up_write(&old_ns->sem); + put_namespace(old_ns); + } + put_namespace(mntpt_ns); + +out: path_release(&old_nd); return err; } @@ -1090,7 +1144,8 @@ int copy_namespace(int flags, struct tas down_write(&tsk->namespace->sem); /* First pass: copy the tree topology */ - new_ns->root = copy_tree(namespace->root, namespace->root->mnt_root); + new_ns->root = copy_tree(namespace->root, namespace->root->mnt_root, + new_ns); if (!new_ns->root) { up_write(&tsk->namespace->sem); kfree(new_ns); @@ -1108,7 +1163,6 @@ int copy_namespace(int flags, struct tas p = namespace->root; q = new_ns->root; while (p) { - q->mnt_namespace = new_ns; if (fs) { if (p == fs->rootmnt) { rootmnt = p; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-23 7:24 ` Ram @ 2005-05-23 8:24 ` Miklos Szeredi 0 siblings, 0 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-23 8:24 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > yes done. enclosed the patch. > > But seems like this patch opens up questions like: > > Should mounts/umounts/remounts/pivot_root in foreign namespaces > be allowed? I think yes. Moving a subtree to a different namespace is a bit tricky so maybe move should still be restricted to be within a single namespace. And I think we should relax the checks under /proc, to allow proper access to foreign namespaces as far as it doesn't impact security. E.g. it makes sense to allow a process to access /proc/self/fd/XXX even if XXX resides in a different namespace. Currently that is not allowed. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 8:45 ` Ram 2005-05-21 9:09 ` Miklos Szeredi @ 2005-05-21 9:48 ` Miklos Szeredi 1 sibling, 0 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-21 9:48 UTC (permalink / raw) To: linuxram; +Cc: linux-kernel, linux-fsdevel, akpm, viro, jamie > I am not getting this comment. R u assuming that a detached mount > will have NULL namespace? If so I dont see it being the case. > Or am I missing some subtle point? On a related note: now that it's not necessarily current->namespace that's used as a destination for the mount, you'll have to pass mntpt_ns into copy_tree() and clone_mnt() so that mnt->mnt_namespace can correctly be set. That will also enable some cleanup in copy_namespace(), where you can remove the of setting ->mnt_namespace. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 8:09 ` Miklos Szeredi 2005-05-21 8:45 ` Ram @ 2005-05-21 13:46 ` Jamie Lokier 2005-05-22 8:08 ` Miklos Szeredi 1 sibling, 1 reply; 36+ messages in thread From: Jamie Lokier @ 2005-05-21 13:46 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: > I still see a problem: what if old_nd.mnt is already detached, and > bind is non-recursive. Now it fails with EINVAL, though it used to > work (and I think is very useful). Hey, you just made another argument for not detaching mounts when the last task with that current->namespace exits, but instead detaching mounts when the last reference to any vfsmnt in the namespace is dropped. Hint :) -- Jamie ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 13:46 ` Jamie Lokier @ 2005-05-22 8:08 ` Miklos Szeredi 2005-05-22 17:04 ` [RFC][PATCH] /proc/dead_mounts support (Was: [RFC][PATCH] rbind across ...) Miklos Szeredi ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-22 8:08 UTC (permalink / raw) To: jamie; +Cc: linuxram, linux-kernel, linux-fsdevel, akpm, viro > > I still see a problem: what if old_nd.mnt is already detached, and > > bind is non-recursive. Now it fails with EINVAL, though it used to > > work (and I think is very useful). > > Hey, you just made another argument for not detaching mounts when the > last task with that current->namespace exits, but instead detaching > mounts when the last reference to any vfsmnt in the namespace is dropped. > > Hint :) I have a better idea: - create a "dead_mounts" namespace. - chain each detached mount's ->mnt_list on dead_mounts->list - set mnt_namespace to dead_mounts - export the list via proc through the usual mount list interface The last would be a nice bonus: I've always wanted to see the list of detached, but not-yet destroyed mounts. Does anybody see a problem with that? Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC][PATCH] /proc/dead_mounts support (Was: [RFC][PATCH] rbind across ...) 2005-05-22 8:08 ` Miklos Szeredi @ 2005-05-22 17:04 ` Miklos Szeredi 2005-05-22 21:10 ` [RFC][PATCH] rbind across namespaces Ram 2005-05-24 0:39 ` Mike Waychison 2 siblings, 0 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-22 17:04 UTC (permalink / raw) To: miklos; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > I have a better idea: > > - create a "dead_mounts" namespace. > - chain each detached mount's ->mnt_list on dead_mounts->list > - set mnt_namespace to dead_mounts > - export the list via proc through the usual mount list interface > > The last would be a nice bonus: I've always wanted to see the list of > detached, but not-yet destroyed mounts. Here's a patch that does this. It needs all the other namespace patches, so I made a diff against 2.6.12-rc4-mm2 in case anybody wants to try it: http://www.inf.bme.hu/~mszeredi/patches/patch-2.6.12-rc4-mm2-szm1 It works for me(TM). Miklos Index: linux/include/linux/proc_fs.h =================================================================== --- linux.orig/include/linux/proc_fs.h 2005-05-22 16:09:37.000000000 +0200 +++ linux/include/linux/proc_fs.h 2005-05-22 16:09:45.000000000 +0200 @@ -127,6 +127,7 @@ extern struct dentry *proc_lookup(struct extern struct file_operations proc_kcore_operations; extern struct file_operations proc_kmsg_operations; extern struct file_operations ppc_htab_operations; +extern struct file_operations proc_dead_mounts_operations; /* * proc_tty.c Index: linux/fs/proc/proc_misc.c =================================================================== --- linux.orig/fs/proc/proc_misc.c 2005-05-22 16:09:37.000000000 +0200 +++ linux/fs/proc/proc_misc.c 2005-05-22 16:09:45.000000000 +0200 @@ -652,6 +652,9 @@ void __init proc_misc_init(void) create_proc_read_entry(p->name, 0, NULL, p->read_proc, NULL); proc_symlink("mounts", NULL, "self/mounts"); + entry = create_proc_entry("dead_mounts", S_IRUSR, &proc_root); + if (entry) + entry->proc_fops = &proc_dead_mounts_operations; /* And now for trickier ones */ entry = create_proc_entry("kmsg", S_IRUSR, &proc_root); Index: linux/fs/namespace.c =================================================================== --- linux.orig/fs/namespace.c 2005-05-22 16:09:45.000000000 +0200 +++ linux/fs/namespace.c 2005-05-22 16:12:23.000000000 +0200 @@ -42,6 +42,7 @@ static inline int sysfs_init(void) static struct list_head *mount_hashtable; static int hash_mask, hash_bits; static kmem_cache_t *mnt_cache; +static struct namespace *dead_mounts; static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry) { @@ -175,6 +176,8 @@ clone_mnt(struct vfsmount *old, struct d void __mntput(struct vfsmount *mnt) { struct super_block *sb = mnt->mnt_sb; + list_del(&mnt->mnt_list); + spin_unlock(&vfsmount_lock); dput(mnt->mnt_root); free_vfsmnt(mnt); deactivate_super(sb); @@ -240,7 +243,10 @@ static int show_vfsmnt(struct seq_file * mangle(m, mnt->mnt_devname ? mnt->mnt_devname : "none"); seq_putc(m, ' '); - seq_path(m, mnt, mnt->mnt_root, " \t\n\\"); + if (mnt->mnt_namespace != dead_mounts) + seq_path(m, mnt, mnt->mnt_root, " \t\n\\"); + else + seq_puts(m, "(detached)"); seq_putc(m, ' '); mangle(m, mnt->mnt_sb->s_type->name); seq_puts(m, mnt->mnt_sb->s_flags & MS_RDONLY ? " ro" : " rw"); @@ -265,6 +271,71 @@ struct seq_operations mounts_op = { .show = show_vfsmnt }; +/* /proc/dead_mounts needs slightly different handling than above */ +static void *dead_m_start(struct seq_file *m, loff_t *pos) +{ + struct namespace *n = m->private; + struct vfsmount *res = NULL; + struct list_head *p; + loff_t l = *pos; + + spin_lock(&vfsmount_lock); + list_for_each(p, &n->list) + if (!l--) { + res = mntget(list_entry(p, struct vfsmount, mnt_list)); + break; + } + spin_unlock(&vfsmount_lock); + return res; +} + +static void *dead_m_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct namespace *n = m->private; + struct vfsmount *curr = v; + struct vfsmount *res = NULL; + struct list_head *p; + + (*pos)++; + spin_lock(&vfsmount_lock); + p = curr->mnt_list.next; + if (p != &n->list) + res = mntget(list_entry(p, struct vfsmount, mnt_list)); + spin_unlock(&vfsmount_lock); + mntput(curr); + return res; +} + +static void dead_m_stop(struct seq_file *m, void *v) +{ + struct vfsmount *curr = v; + mntput(curr); +} + +static struct seq_operations dead_mounts_op = { + .start = dead_m_start, + .next = dead_m_next, + .stop = dead_m_stop, + .show = show_vfsmnt +}; + +static int dead_mounts_open(struct inode *inode, struct file *file) +{ + int ret = seq_open(file, &dead_mounts_op); + if (!ret) { + struct seq_file *m = file->private_data; + m->private = dead_mounts; + } + return ret; +} + +struct file_operations proc_dead_mounts_operations = { + .open = dead_mounts_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + /** * may_umount_tree - check if a mount tree is busy * @mnt: root of mount tree @@ -343,14 +414,13 @@ static void umount_tree(struct vfsmount LIST_HEAD(kill); for (p = mnt; p; p = next_mnt(p, mnt)) { - list_del(&p->mnt_list); - list_add(&p->mnt_list, &kill); - p->mnt_namespace = NULL; + list_move(&p->mnt_list, &kill); + p->mnt_namespace = dead_mounts; } while (!list_empty(&kill)) { mnt = list_entry(kill.next, struct vfsmount, mnt_list); - list_del_init(&mnt->mnt_list); + list_move_tail(&mnt->mnt_list, &dead_mounts->list); list_del_init(&mnt->mnt_fslink); if (mnt->mnt_parent == mnt) { spin_unlock(&vfsmount_lock); @@ -447,7 +517,7 @@ static int do_umount(struct vfsmount *mn } retval = -EBUSY; if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) { - if (!list_empty(&mnt->mnt_list)) + if (mnt->mnt_namespace != dead_mounts) umount_tree(mnt); retval = 0; } @@ -842,8 +912,8 @@ static void expire_mount(struct vfsmount struct nameidata old_nd; /* delete from the namespace */ - list_del_init(&mnt->mnt_list); - mnt->mnt_namespace = NULL; + list_move_tail(&mnt->mnt_list, &dead_mounts->list); + mnt->mnt_namespace = dead_mounts; detach_mnt(mnt, &old_nd); spin_unlock(&vfsmount_lock); path_release(&old_nd); @@ -1387,6 +1457,14 @@ static void __init init_mount_tree(void) namespace->root = mnt; mnt->mnt_namespace = namespace; + dead_mounts = kmalloc(sizeof(struct namespace), GFP_KERNEL); + if (!dead_mounts) + panic("Can't allocate dead_mounts namespace"); + atomic_set(&dead_mounts->count, 1); + INIT_LIST_HEAD(&dead_mounts->list); + init_rwsem(&dead_mounts->sem); + dead_mounts->root = NULL; + init_task.namespace = namespace; read_lock(&tasklist_lock); do_each_thread(g, p) { Index: linux/include/linux/mount.h =================================================================== --- linux.orig/include/linux/mount.h 2005-05-22 16:09:37.000000000 +0200 +++ linux/include/linux/mount.h 2005-05-22 16:09:45.000000000 +0200 @@ -46,11 +46,12 @@ static inline struct vfsmount *mntget(st } extern void __mntput(struct vfsmount *mnt); +extern spinlock_t vfsmount_lock; static inline void _mntput(struct vfsmount *mnt) { if (mnt) { - if (atomic_dec_and_test(&mnt->mnt_count)) + if (atomic_dec_and_lock(&mnt->mnt_count, &vfsmount_lock)) __mntput(mnt); } } @@ -75,7 +76,5 @@ extern int do_add_mount(struct vfsmount extern void mark_mounts_for_expiry(struct list_head *mounts); -extern spinlock_t vfsmount_lock; - #endif #endif /* _LINUX_MOUNT_H */ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-22 8:08 ` Miklos Szeredi 2005-05-22 17:04 ` [RFC][PATCH] /proc/dead_mounts support (Was: [RFC][PATCH] rbind across ...) Miklos Szeredi @ 2005-05-22 21:10 ` Ram 2005-05-23 5:07 ` Miklos Szeredi 2005-05-24 0:39 ` Mike Waychison 2 siblings, 1 reply; 36+ messages in thread From: Ram @ 2005-05-22 21:10 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linux-kernel, linux-fsdevel, Andrew Morton, viro On Sun, 2005-05-22 at 01:08, Miklos Szeredi wrote: > > > I still see a problem: what if old_nd.mnt is already detached, and > > > bind is non-recursive. Now it fails with EINVAL, though it used to > > > work (and I think is very useful). > > > > Hey, you just made another argument for not detaching mounts when the > > last task with that current->namespace exits, but instead detaching > > mounts when the last reference to any vfsmnt in the namespace is dropped. > > > > Hint :) > > I have a better idea: > > - create a "dead_mounts" namespace. > - chain each detached mount's ->mnt_list on dead_mounts->list > - set mnt_namespace to dead_mounts > - export the list via proc through the usual mount list interface > > The last would be a nice bonus: I've always wanted to see the list of > detached, but not-yet destroyed mounts. > > Does anybody see a problem with that? Yes. :) because I will have to change my 'rbind across namespace' patch because now detached mounts will have dead_mounts namespace instead of null namespace. RP > > Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-22 21:10 ` [RFC][PATCH] rbind across namespaces Ram @ 2005-05-23 5:07 ` Miklos Szeredi 0 siblings, 0 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-23 5:07 UTC (permalink / raw) To: linuxram; +Cc: miklos, jamie, linux-kernel, linux-fsdevel, akpm, viro > Yes. :) because I will have to change my 'rbind across namespace' patch > because now detached mounts will have dead_mounts namespace instead of > null namespace. I was thinking, that it would get rid of all special casing, but I realize now, that the ns->root != NULL check still has to be done. Oh well... Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-22 8:08 ` Miklos Szeredi 2005-05-22 17:04 ` [RFC][PATCH] /proc/dead_mounts support (Was: [RFC][PATCH] rbind across ...) Miklos Szeredi 2005-05-22 21:10 ` [RFC][PATCH] rbind across namespaces Ram @ 2005-05-24 0:39 ` Mike Waychison 2005-05-24 5:43 ` Miklos Szeredi 2 siblings, 1 reply; 36+ messages in thread From: Mike Waychison @ 2005-05-24 0:39 UTC (permalink / raw) To: Miklos Szeredi; +Cc: linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: >>>I still see a problem: what if old_nd.mnt is already detached, and >>>bind is non-recursive. Now it fails with EINVAL, though it used to >>>work (and I think is very useful). >> >>Hey, you just made another argument for not detaching mounts when the >>last task with that current->namespace exits, but instead detaching >>mounts when the last reference to any vfsmnt in the namespace is dropped. >> >>Hint :) > > > I have a better idea: > > - create a "dead_mounts" namespace. > - chain each detached mount's ->mnt_list on dead_mounts->list > - set mnt_namespace to dead_mounts > - export the list via proc through the usual mount list interface > > The last would be a nice bonus: I've always wanted to see the list of > detached, but not-yet destroyed mounts. > > Does anybody see a problem with that? > FWIW, all this stuff has already been done and posted here. Detachable chunks of vfsmounts: http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109872862003192&w=2 'Soft' reference counts for manipulating vfsmounts without pinning them down: http://marc.theaimsgroup.com/?l=linux-kernel&m=109872797030644&w=2 Referencing vfsmounts in userspace using a file descriptor: http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2 walking mountpoints in userspace: http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2 attaching mountpoints in userspace: http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2 detaching mountpoints in userspace: http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2 getting info from a vfsmount: http://marc.theaimsgroup.com/?l=linux-kernel&m=109875135030473&w=2 Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 0:39 ` Mike Waychison @ 2005-05-24 5:43 ` Miklos Szeredi 2005-05-24 7:13 ` Mike Waychison 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 5:43 UTC (permalink / raw) To: mikew; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > FWIW, all this stuff has already been done and posted here. > > Detachable chunks of vfsmounts: > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109872862003192&w=2 > > 'Soft' reference counts for manipulating vfsmounts without pinning them > down: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109872797030644&w=2 I think this might just interest Jamie Lokier. He had a very similar poposal recently, but without reference to this patch, so I guess he wasn't aware of it. > Referencing vfsmounts in userspace using a file descriptor: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2 Why not just use /proc/PID/fd/FD? > walking mountpoints in userspace: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2 Is this needed? Userspace can find out mountpoints from /proc/mounts (or something similar for detached trees). > attaching mountpoints in userspace: > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2 Again, bind from/to /proc/PID/fd/FD should work without any new interfaces. > detaching mountpoints in userspace: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2 What's wrong with sys_umount()? > getting info from a vfsmount: > http://marc.theaimsgroup.com/?l=linux-kernel&m=109875135030473&w=2 /proc or /sys should do fine for this purpose I think. I agree, that having "floating trees" could be useful, but I don't see the point of adding new interfaces to support it. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 5:43 ` Miklos Szeredi @ 2005-05-24 7:13 ` Mike Waychison 2005-05-24 8:25 ` Miklos Szeredi 2005-05-24 9:18 ` Miklos Szeredi 0 siblings, 2 replies; 36+ messages in thread From: Mike Waychison @ 2005-05-24 7:13 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: >>FWIW, all this stuff has already been done and posted here. >> >>Detachable chunks of vfsmounts: >>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109872862003192&w=2 >> >>'Soft' reference counts for manipulating vfsmounts without pinning them >>down: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109872797030644&w=2 > > > I think this might just interest Jamie Lokier. He had a very similar > poposal recently, but without reference to this patch, so I guess he > wasn't aware of it. > Interesting. I haven't been following LKML/fsdevel lately due to lack of time. > >>Referencing vfsmounts in userspace using a file descriptor: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2 > > > Why not just use /proc/PID/fd/FD? In what sense? readlink of /proc/PID/fd/* will provide a pathname relative to current's root: useless for any paths not in current's namespace. Also, if we were to hijack /proc/PID/fd/* for cross namespace manipulation, then we'd be enabling any root user on the system to modify anyone's namespace. Any security *cough* provided by namespaces is lost. A more secure way is to have root in namespace A allow root in namespace B do the mounts. If you further restrict how this hand-off happens, such as the walking constraints in the patch mentioned below, we can restrict modification of a namespace to a given sub-tree of vfsmounts. This interface also has the huge advantage that you gain all the goodies of using file descriptors, such as SCM_RIGHTS. You can hand of entire trees of mountpoints between applications without ever even binding them to any namespace whatsoever. Tie this in with some userspace code that can mount devices for users with restrictions and appropriate policy, you can create some API+daemon for regular user apps to get things mounted in a way that guarantees hiding from other users. > > >>walking mountpoints in userspace: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2 > > > Is this needed? Userspace can find out mountpoints from /proc/mounts > (or something similar for detached trees). > With detached mountpoints (and especially with detached mountpoint _trees_) it can become very difficult to assess which trees are which. Also, just like /proc/PID/fd/*, /proc/mounts is built according to _current_'s root. This only gives a skewed view of what is going on. > >>attaching mountpoints in userspace: >>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2 > > > Again, bind from/to /proc/PID/fd/FD should work without any new > interfaces. No.. It wouldn't. Pathname resolution is doing everything according to the ->readlink information provided by this magic proc files, again in current's namespace. If you care to hijack ->follow_link, prepare yourself for a slew of corner cases. > > >>detaching mountpoints in userspace: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2 > > > What's wrong with sys_umount()? sys_umount only works with paths in current's namespace. It doesn't allow you to handle vfsmounts as primaries in userspace. > > >>getting info from a vfsmount: >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875135030473&w=2 > > > /proc or /sys should do fine for this purpose I think. > Sure, if you can look it up somehow. Even if you could currently walk around in another namespace using fchdir+chdir, you couldn't pull out kernel-knowledge of mountpoints, you have to fall back to /etc/mtab, which is completely broken when you mix in namespaces anyway.. > I agree, that having "floating trees" could be useful, but I don't see > the point of adding new interfaces to support it. > I'm not hugely tied to the idea at the moment. I implemented it as part of this interface cause it was a simple extension to what was being done. > Miklos > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 7:13 ` Mike Waychison @ 2005-05-24 8:25 ` Miklos Szeredi 2005-05-24 17:09 ` Mike Waychison 2005-05-24 9:18 ` Miklos Szeredi 1 sibling, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 8:25 UTC (permalink / raw) To: mike; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > > > >>Referencing vfsmounts in userspace using a file descriptor: > >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2 > > > > > > Why not just use /proc/PID/fd/FD? > > In what sense? readlink of /proc/PID/fd/* will provide a pathname > relative to current's root: useless for any paths not in current's > namespace. Not readlink, but actual dereference of link will give you exactly the vfsmount/dentry the file was opened on. If you want to bind/move whatever on that mount, that's possible, even if it's a "detached tree". > Also, if we were to hijack /proc/PID/fd/* for cross namespace > manipulation, then we'd be enabling any root user on the system to > modify anyone's namespace. No. Obviously there must be limitations on which process can access /proc/PID/fd. It's obviously safe to allow 'self' for example. But any process you can ptrace should also be safe. > This interface also has the huge advantage that you gain all the goodies > of using file descriptors, such as SCM_RIGHTS. You can hand of entire > trees of mountpoints between applications without ever even binding them > to any namespace whatsoever. Why is that dependent on the interface? The /proc interface already allows _everything_ you want to do with file descriptors. Only the the necessary restrictions should be worked out. > Tie this in with some userspace code that can mount devices for users > with restrictions and appropriate policy, you can create some API+daemon > for regular user apps to get things mounted in a way that guarantees > hiding from other users. Yes. And I think that can be done without any new magic ioctls. Just with mount/umount and /proc. The implementation is another question. I'll look through your patches to see how you achieve all this. Thanks, Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 8:25 ` Miklos Szeredi @ 2005-05-24 17:09 ` Mike Waychison 2005-05-24 17:31 ` Miklos Szeredi 0 siblings, 1 reply; 36+ messages in thread From: Mike Waychison @ 2005-05-24 17:09 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: >>>>Referencing vfsmounts in userspace using a file descriptor: >>>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109871948812782&w=2 >>> >>> >>>Why not just use /proc/PID/fd/FD? >> >>In what sense? readlink of /proc/PID/fd/* will provide a pathname >>relative to current's root: useless for any paths not in current's >>namespace. > > > Not readlink, but actual dereference of link will give you exactly the > vfsmount/dentry the file was opened on. If you want to bind/move > whatever on that mount, that's possible, even if it's a "detached > tree". Removing proc_check_root essentially removes any protection that namespaces provided in the first place. Think of it like virtual memory. You can fork off and create your own COW instance, and no one can see what you are doing unless they ptrace you or explicitly ask you. The mountfd model allows for explicit handing off of vfsmounts between namespaces without allowing arbitrary access. > > >>Also, if we were to hijack /proc/PID/fd/* for cross namespace >>manipulation, then we'd be enabling any root user on the system to >>modify anyone's namespace. > > > No. Obviously there must be limitations on which process can access > /proc/PID/fd. It's obviously safe to allow 'self' for example. But > any process you can ptrace should also be safe. > Yet even this doesn't allow userspace to define it's own policy for inter-namespace manipulation. > >>This interface also has the huge advantage that you gain all the goodies >>of using file descriptors, such as SCM_RIGHTS. You can hand of entire >>trees of mountpoints between applications without ever even binding them >>to any namespace whatsoever. > > > Why is that dependent on the interface? The /proc interface already > allows _everything_ you want to do with file descriptors. Only the > the necessary restrictions should be worked out. > Worked out where? Allowing this stuff to happen in /proc forces you to set this policy in the kernel. > >>Tie this in with some userspace code that can mount devices for users >>with restrictions and appropriate policy, you can create some API+daemon >>for regular user apps to get things mounted in a way that guarantees >>hiding from other users. > > > Yes. And I think that can be done without any new magic ioctls. Just > with mount/umount and /proc. > > The implementation is another question. I'll look through your > patches to see how you achieve all this. > Beware that due to the detached-subtrees bit, the locking became a bit ugly, requiring a global rw_lock for mntget/mntput. I still haven't figured out a better way to keep per-vfsmounts counts and per-subtree counts in sync. That said, the common case is a read lock, for shared access. Only on mnt_attach / mnt_detach (usually a slowpath anyway) do we require exclusive access. This would have been a good candidate for brlocks if they were still around. Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:09 ` Mike Waychison @ 2005-05-24 17:31 ` Miklos Szeredi 2005-05-24 17:44 ` Mike Waychison 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 17:31 UTC (permalink / raw) To: mikew; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > >>In what sense? readlink of /proc/PID/fd/* will provide a pathname > >>relative to current's root: useless for any paths not in current's > >>namespace. > > > > > > Not readlink, but actual dereference of link will give you exactly the > > vfsmount/dentry the file was opened on. If you want to bind/move > > whatever on that mount, that's possible, even if it's a "detached > > tree". > > Removing proc_check_root essentially removes any protection that > namespaces provided in the first place. > > Think of it like virtual memory. You can fork off and create your own > COW instance, and no one can see what you are doing unless they ptrace > you or explicitly ask you. The mountfd model allows for explicit > handing off of vfsmounts between namespaces without allowing arbitrary > access. Note: I didn't say we should remove proc_check_root(). I said _if_ you remove it, _then_ mounting on foreign namespace will work, which has actually been confirmed experimentally. So your follow_link argument doesn't hold. The proc code does the follow_link in some clever way, that the looked up object will end up with the same vfsmount/dentry pair as the file. > Yet even this doesn't allow userspace to define it's own policy for > inter-namespace manipulation. OK. Let's keep the kernel policy simple: just allow a process to access it's _own_ file descriptors in proc. I.e. allow access to /proc/self/fd/* even if it comes from a foreign namespace. Then the policy (who passes whom the file descriptors) is entirely up to userspace (just as with your scheme). /proc/self/fd/FD doesn't give any extra rights to the process, it just makes mounting from/to detached mounts, and foreign namespaces possible without new interfaces. > Beware that due to the detached-subtrees bit, the locking became a bit > ugly, requiring a global rw_lock for mntget/mntput. I still haven't > figured out a better way to keep per-vfsmounts counts and per-subtree > counts in sync. Did you measure the effect on performace? Maybe it isn't so bad. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:31 ` Miklos Szeredi @ 2005-05-24 17:44 ` Mike Waychison 2005-05-24 17:56 ` Miklos Szeredi 0 siblings, 1 reply; 36+ messages in thread From: Mike Waychison @ 2005-05-24 17:44 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: >>>>In what sense? readlink of /proc/PID/fd/* will provide a pathname >>>>relative to current's root: useless for any paths not in current's >>>>namespace. >>> >>> >>>Not readlink, but actual dereference of link will give you exactly the >>>vfsmount/dentry the file was opened on. If you want to bind/move >>>whatever on that mount, that's possible, even if it's a "detached >>>tree". >> >>Removing proc_check_root essentially removes any protection that >>namespaces provided in the first place. >> >>Think of it like virtual memory. You can fork off and create your own >>COW instance, and no one can see what you are doing unless they ptrace >>you or explicitly ask you. The mountfd model allows for explicit >>handing off of vfsmounts between namespaces without allowing arbitrary >>access. > > > Note: I didn't say we should remove proc_check_root(). I said _if_ > you remove it, _then_ mounting on foreign namespace will work, which > has actually been confirmed experimentally. > OK. > So your follow_link argument doesn't hold. The proc code does the > follow_link in some clever way, that the looked up object will end up > with the same vfsmount/dentry pair as the file. > Yes. I hadn't realized that the only safe-guard in place was proc_check_root. I had made the assumption that follow_link was using vfs_follow_link with the readlink info. > >>Yet even this doesn't allow userspace to define it's own policy for >>inter-namespace manipulation. > > > OK. Let's keep the kernel policy simple: just allow a process to > access it's _own_ file descriptors in proc. I.e. allow access to > /proc/self/fd/* even if it comes from a foreign namespace. > > Then the policy (who passes whom the file descriptors) is entirely up > to userspace (just as with your scheme). > > /proc/self/fd/FD doesn't give any extra rights to the process, it just > makes mounting from/to detached mounts, and foreign namespaces > possible without new interfaces. So you'd say 'mount /dev/foo /proc/self/fd/4' if 4 was an fd pointing to a directory in another namespace? That does require proc_check_root be removed. :\ > > >>Beware that due to the detached-subtrees bit, the locking became a bit >>ugly, requiring a global rw_lock for mntget/mntput. I still haven't >>figured out a better way to keep per-vfsmounts counts and per-subtree >>counts in sync. > > > Did you measure the effect on performace? Maybe it isn't so bad. As far as I could tell without doing high-smp benchmarks, it doesn't slow anything down. In this case, we aren't on the fastest path anyway, as we are usually a) walking a path across a mountpoint, requiring the global vfsmount_lock anyway. b) closing a file, which isn't fast either. You could micro-optimize the pivoting of from one vfsmount to another during a walk to only grab the lock once, but there may be no profound effect and any gains would be lost in the noise. Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:44 ` Mike Waychison @ 2005-05-24 17:56 ` Miklos Szeredi 2005-05-24 18:04 ` Mike Waychison 0 siblings, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 17:56 UTC (permalink / raw) To: mikew; +Cc: miklos, jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > So you'd say 'mount /dev/foo /proc/self/fd/4' if 4 was an fd pointing to > a directory in another namespace? > > That does require proc_check_root be removed. :\ Or just make an exception to self? proc_check_root() could begin with: if (current == proc_task(inode)) return 0; For all other tasks it would still be effective. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:56 ` Miklos Szeredi @ 2005-05-24 18:04 ` Mike Waychison 2005-05-30 19:06 ` Ram 0 siblings, 1 reply; 36+ messages in thread From: Mike Waychison @ 2005-05-24 18:04 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: >>So you'd say 'mount /dev/foo /proc/self/fd/4' if 4 was an fd pointing to >>a directory in another namespace? >> >>That does require proc_check_root be removed. :\ > > > Or just make an exception to self? > > proc_check_root() could begin with: > > if (current == proc_task(inode)) > return 0; > > For all other tasks it would still be effective. > Yes, I think something like that is workable :) (we still have to fix up all the namespace->sem locking. I have yet to review Ram's patch.) Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 18:04 ` Mike Waychison @ 2005-05-30 19:06 ` Ram 0 siblings, 0 replies; 36+ messages in thread From: Ram @ 2005-05-30 19:06 UTC (permalink / raw) To: Mike Waychison Cc: Miklos Szeredi, Jamie Lokier, linux-kernel, linux-fsdevel, Andrew Morton, viro On Tue, 2005-05-24 at 11:04, Mike Waychison wrote: > Miklos Szeredi wrote: > >>So you'd say 'mount /dev/foo /proc/self/fd/4' if 4 was an fd pointing to > >>a directory in another namespace? > >> > >>That does require proc_check_root be removed. :\ > > > > > > Or just make an exception to self? > > > > proc_check_root() could begin with: > > > > if (current == proc_task(inode)) > > return 0; > > > > For all other tasks it would still be effective. > > > > Yes, I think something like that is workable :) > > (we still have to fix up all the namespace->sem locking. I have yet to > review Ram's patch.) Yes. This patch is not fully ready yet. It still has to take care of using the correct namespace sems for operations like umount/move etc. Have been recently busy with shared subtree coding. Will work on this to remove all the assumptions in the code that think 'a process can access only its own namespace'. RP > > Mike Waychison > - > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 7:13 ` Mike Waychison 2005-05-24 8:25 ` Miklos Szeredi @ 2005-05-24 9:18 ` Miklos Szeredi 2005-05-24 17:15 ` Mike Waychison 1 sibling, 1 reply; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 9:18 UTC (permalink / raw) To: mike; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro (Sorry for replying in two parts, I missed this in the first reading) > >>walking mountpoints in userspace: > >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2 > > > > > > Is this needed? Userspace can find out mountpoints from /proc/mounts > > (or something similar for detached trees). > > > > With detached mountpoints (and especially with detached mountpoint > _trees_) it can become very difficult to assess which trees are which. > > Also, just like /proc/PID/fd/*, /proc/mounts is built according to > _current_'s root. This only gives a skewed view of what is going on. I'm thinking of /proc/PID/fdinfo/FD/mounts or similar. > >>attaching mountpoints in userspace: > >>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2 > > > > > > Again, bind from/to /proc/PID/fd/FD should work without any new > > interfaces. > > No.. It wouldn't. Pathname resolution is doing everything according to > the ->readlink information provided by this magic proc files, again in > current's namespace. If you care to hijack ->follow_link, prepare > yourself for a slew of corner cases. No need to hijack, it's already done. Removing calls to proc_check_root() will allow access to different namespaces detached mounts, etc. It's been tried and actually works. What's more you don't even need /proc, just pass a file descriptor (with reference to mount in different namespace, etc.) to another process with SCM_RIGHTS, do fchrdir(fd), and then do mount --bind, etc. This actually works with an unmodified kernel. > >>detaching mountpoints in userspace: > >>http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2 > > > > > > What's wrong with sys_umount()? > > sys_umount only works with paths in current's namespace. It doesn't > allow you to handle vfsmounts as primaries in userspace. But it does. Again, either with fchdir() or /proc. fchdir() has the drawback of only working on directories, and that a process has only one CWD. The /proc approach has no such limitations. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 9:18 ` Miklos Szeredi @ 2005-05-24 17:15 ` Mike Waychison 2005-05-24 17:46 ` Miklos Szeredi 2005-05-24 18:15 ` Jamie Lokier 0 siblings, 2 replies; 36+ messages in thread From: Mike Waychison @ 2005-05-24 17:15 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro Miklos Szeredi wrote: > (Sorry for replying in two parts, I missed this in the first reading) > > >>>>walking mountpoints in userspace: >>>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109875012510262&w=2 >>> >>> >>>Is this needed? Userspace can find out mountpoints from /proc/mounts >>>(or something similar for detached trees). >>> >> >>With detached mountpoints (and especially with detached mountpoint >>_trees_) it can become very difficult to assess which trees are which. >> >>Also, just like /proc/PID/fd/*, /proc/mounts is built according to >>_current_'s root. This only gives a skewed view of what is going on. > > > I'm thinking of /proc/PID/fdinfo/FD/mounts or similar. > > >>>>attaching mountpoints in userspace: >>>>http://marc.theaimsgroup.com/?l=linux-fsdevel&m=109875063100111&w=2 >>> >>> >>>Again, bind from/to /proc/PID/fd/FD should work without any new >>>interfaces. >> >>No.. It wouldn't. Pathname resolution is doing everything according to >>the ->readlink information provided by this magic proc files, again in >>current's namespace. If you care to hijack ->follow_link, prepare >>yourself for a slew of corner cases. > > > No need to hijack, it's already done. Removing calls to > proc_check_root() will allow access to different namespaces detached > mounts, etc. It's been tried and actually works. > See previous message as why we don't want to allow this. > What's more you don't even need /proc, just pass a file descriptor > (with reference to mount in different namespace, etc.) to another > process with SCM_RIGHTS, do fchrdir(fd), and then do mount --bind, > etc. This actually works with an unmodified kernel. It shouldn't. An unmodified kernel has check_mnt to keep you from doing this. > > >>>>detaching mountpoints in userspace: >>>>http://marc.theaimsgroup.com/?l=linux-kernel&m=109880051800963&w=2 >>> >>> >>>What's wrong with sys_umount()? >> >>sys_umount only works with paths in current's namespace. It doesn't >>allow you to handle vfsmounts as primaries in userspace. > > > But it does. Again, either with fchdir() or /proc. > > fchdir() has the drawback of only working on directories, and that a > process has only one CWD. The /proc approach has no such limitations. > Again, check_mnt won't let this happen. Another limitation is that manipulating mounts in another namespace breaks namespace->list locking.. Add that to a todo list if you still think this is a good idea. Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:15 ` Mike Waychison @ 2005-05-24 17:46 ` Miklos Szeredi 2005-05-24 18:15 ` Jamie Lokier 1 sibling, 0 replies; 36+ messages in thread From: Miklos Szeredi @ 2005-05-24 17:46 UTC (permalink / raw) To: mikew; +Cc: miklos, jamie, linuxram, linux-kernel, linux-fsdevel, akpm, viro > > What's more you don't even need /proc, just pass a file descriptor > > (with reference to mount in different namespace, etc.) to another > > process with SCM_RIGHTS, do fchrdir(fd), and then do mount --bind, > > etc. This actually works with an unmodified kernel. > > It shouldn't. An unmodified kernel has check_mnt to keep you from doing > this. Common misconception. Even Al Viro fell for it, even though it was he who wrote the code :) What is not allowed is that the _new_ mountpoint is in a different namespace. The _old_ mount can actually come from a different namespace or a detached mount in a bind (not an rbind though). Again experimentally verified. The check_mnt() function is not a security measure, rather a locking measure. Current code in namespace protects modification of the mount tree with current->namespace->sem. The check_mnt() calls are needed, so that the namespace to be modified is really the current namespace. This limitation can actually be removed. See patch by Ram Pai posted earlier in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=111683338801090&w=2 The code can be generalized for other types of mounts, not just bind. With that and allowing access to /proc/self/fd/*, I think we'll have a proper interface for passing mounts between namespaces. The mount groups idea is orthogonal to this I think, and I'll send comments in a separate mail. Miklos ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 17:15 ` Mike Waychison 2005-05-24 17:46 ` Miklos Szeredi @ 2005-05-24 18:15 ` Jamie Lokier 2005-05-24 18:33 ` Mike Waychison 1 sibling, 1 reply; 36+ messages in thread From: Jamie Lokier @ 2005-05-24 18:15 UTC (permalink / raw) To: Mike Waychison Cc: Miklos Szeredi, linuxram, linux-kernel, linux-fsdevel, akpm, viro Mike Waychison wrote: > >No need to hijack, it's already done. Removing calls to > >proc_check_root() will allow access to different namespaces detached > >mounts, etc. It's been tried and actually works. > > See previous message as why we don't want to allow this. If you can ptrace any process which is in another namespace, then you _effectively_ have full access to that namespace. It's quite easy to do, and negates the supposed security of namespaces. Because of that, there's _no_ real security benefit from denying access to /proc/NNN/fd/ if you are able to ptrace task NNN. What I think makes sense is this: 1. Deny access to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root if task NNN cannot be ptraced. 3. Allow entry to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root if ptrace is allowed; the namespace being irrelevant. 3. Use _exactly_ the same condition as for ptracing, i.e. MAY_PTRACE in fs/proc/base.c. Ensure that condition is consistent with the tests in kernel/ptrace.c, possibly putting the condition in a common header file to keep it consistent in future. 4. If further restrictions are desired, to make namespaces more strict, those should be implemented by further restrictions on which tasks are allowed to ptrace other tasks. -- Jamie ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 18:15 ` Jamie Lokier @ 2005-05-24 18:33 ` Mike Waychison 2005-05-24 21:51 ` Jamie Lokier 0 siblings, 1 reply; 36+ messages in thread From: Mike Waychison @ 2005-05-24 18:33 UTC (permalink / raw) To: Jamie Lokier Cc: Miklos Szeredi, linuxram, linux-kernel, linux-fsdevel, akpm, viro Jamie Lokier wrote: > Mike Waychison wrote: > >>>No need to hijack, it's already done. Removing calls to >>>proc_check_root() will allow access to different namespaces detached >>>mounts, etc. It's been tried and actually works. >> >>See previous message as why we don't want to allow this. > > > If you can ptrace any process which is in another namespace, then you > _effectively_ have full access to that namespace. It's quite easy to > do, and negates the supposed security of namespaces. > > Because of that, there's _no_ real security benefit from denying > access to /proc/NNN/fd/ if you are able to ptrace task NNN. > > What I think makes sense is this: > > 1. Deny access to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root > if task NNN cannot be ptraced. > > 3. Allow entry to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root > if ptrace is allowed; the namespace being irrelevant. > > 3. Use _exactly_ the same condition as for ptracing, > i.e. MAY_PTRACE in fs/proc/base.c. Ensure that condition is > consistent with the tests in kernel/ptrace.c, possibly putting > the condition in a common header file to keep it consistent in > future. > > 4. If further restrictions are desired, to make namespaces more > strict, those should be implemented by further restrictions on > which tasks are allowed to ptrace other tasks. > Indeed. A combination of MAY_PTRACE ||ed with a check against current sounds reasonable to me. Mike Waychison ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-24 18:33 ` Mike Waychison @ 2005-05-24 21:51 ` Jamie Lokier 0 siblings, 0 replies; 36+ messages in thread From: Jamie Lokier @ 2005-05-24 21:51 UTC (permalink / raw) To: Mike Waychison Cc: Miklos Szeredi, linuxram, linux-kernel, linux-fsdevel, akpm, viro Mike Waychison wrote: > > 1. Deny access to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root > > if task NNN cannot be ptraced. > > > > 3. Allow entry to /proc/NNN/fd/, /proc/NNN/cwd, /proc/NNN/root > > if ptrace is allowed; the namespace being irrelevant. > > > > 3. Use _exactly_ the same condition as for ptracing, > > i.e. MAY_PTRACE in fs/proc/base.c. Ensure that condition is > > consistent with the tests in kernel/ptrace.c, possibly putting > > the condition in a common header file to keep it consistent in > > future. > > > > 4. If further restrictions are desired, to make namespaces more > > strict, those should be implemented by further restrictions on > > which tasks are allowed to ptrace other tasks. > > > > Indeed. A combination of MAY_PTRACE ||ed with a check against current > sounds reasonable to me. Note that MAY_PTRACE already includes a check against current. -- Jamie ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC][PATCH] rbind across namespaces 2005-05-21 7:26 ` Ram 2005-05-21 8:09 ` Miklos Szeredi @ 2005-05-21 13:43 ` Jamie Lokier 1 sibling, 0 replies; 36+ messages in thread From: Jamie Lokier @ 2005-05-21 13:43 UTC (permalink / raw) To: Ram; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, Andrew Morton, viro Ram wrote: > > Also why check current->namespace? It doesn't hurt to do > > get_namespace() even if it's not strictly needed. And it would > > simplify the code. > > however get_namespace() will chew up some extra cycles > sometimes unnecessarily. I did incorporate your comment and > got much simpler code. Checking current->namespace also chews up extra cycles, sometimes unnecessarily, but the cycles are negligable in both cases. Because they're negligable, the most important thing is to avoid unnecessary complications in the code. -- Jamie ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2005-05-30 19:06 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-20 22:11 [RFC][PATCH] rbind across namespaces Ram 2005-05-21 6:27 ` Miklos Szeredi 2005-05-21 7:26 ` Ram 2005-05-21 8:09 ` Miklos Szeredi 2005-05-21 8:45 ` Ram 2005-05-21 9:09 ` Miklos Szeredi 2005-05-21 10:07 ` Ram 2005-05-21 13:12 ` Miklos Szeredi 2005-05-22 20:25 ` Ram 2005-05-22 20:51 ` Ram 2005-05-23 5:08 ` Miklos Szeredi 2005-05-23 7:24 ` Ram 2005-05-23 8:24 ` Miklos Szeredi 2005-05-21 9:48 ` Miklos Szeredi 2005-05-21 13:46 ` Jamie Lokier 2005-05-22 8:08 ` Miklos Szeredi 2005-05-22 17:04 ` [RFC][PATCH] /proc/dead_mounts support (Was: [RFC][PATCH] rbind across ...) Miklos Szeredi 2005-05-22 21:10 ` [RFC][PATCH] rbind across namespaces Ram 2005-05-23 5:07 ` Miklos Szeredi 2005-05-24 0:39 ` Mike Waychison 2005-05-24 5:43 ` Miklos Szeredi 2005-05-24 7:13 ` Mike Waychison 2005-05-24 8:25 ` Miklos Szeredi 2005-05-24 17:09 ` Mike Waychison 2005-05-24 17:31 ` Miklos Szeredi 2005-05-24 17:44 ` Mike Waychison 2005-05-24 17:56 ` Miklos Szeredi 2005-05-24 18:04 ` Mike Waychison 2005-05-30 19:06 ` Ram 2005-05-24 9:18 ` Miklos Szeredi 2005-05-24 17:15 ` Mike Waychison 2005-05-24 17:46 ` Miklos Szeredi 2005-05-24 18:15 ` Jamie Lokier 2005-05-24 18:33 ` Mike Waychison 2005-05-24 21:51 ` Jamie Lokier 2005-05-21 13:43 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).