open_tree, and bind-mounting directories across mount namespaces

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* open_tree, and bind-mounting directories across mount namespaces
@ 2025-10-31 23:01 Snaipe
  2025-11-01 15:21 ` Franklin Snaipe Mathieu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Snaipe @ 2025-10-31 23:01 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro

Hi folks,

(Disclaimer: I'm not a kernel developer)

I'm currently playing around with the new mount API, on Linux 6.17.6.
One of the things I'm trying to do is to write a program that unshares
its mount namespace and receives a directory file descriptor via an
unix socket from another program that exists in a different mount
namespace. The intent is to have a program that has access to data on
a filesystem that is not normally accessible to other unprivileged
programs, and have that program give access to select directories by
opening them with O_PATH and sending the fds over a unix socket.

One snag I'm currently hitting is that once I call open_tree(fd, "",
OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE), the syscall returns
EINVAL; I've bpftraced it back to __do_loopback's may_copy_tree check
and it looks like it's impossible to do on dentries whose mount
namespace is different that the current task's mount namespace.

I'm trying to understand the reasons this was put in place, and what
it would take to enable the kind of use-case that I have. Would there
be a security risk to relax this condition with some kind of open_tree
flag?

Thanks,

-- 
Franklin "Snaipe" Mathieu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2025-10-31 23:01 open_tree, and bind-mounting directories across mount namespaces Snaipe
@ 2025-11-01 15:21 ` Franklin Snaipe Mathieu
  2025-11-01 15:21 ` [PATCH 1/1] fs: let open_tree open mounts from another namespace Franklin Snaipe Mathieu
  2025-11-05 12:05 ` open_tree, and bind-mounting directories across mount namespaces Christian Brauner
  2 siblings, 0 replies; 9+ messages in thread
From: Franklin Snaipe Mathieu @ 2025-11-01 15:21 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Viro, Franklin "Snaipe" Mathieu

From: "Franklin \"Snaipe\" Mathieu" <me@snai.pe>

I actually ended up trying my initial idea of adding a new open_tree
flag. With this, I was able to make open_tree work on a file descriptor
that was opened in a different mount namespace.

I'm sure there's some sharp edges I haven't thought about that would
make this patch horribly incorrect, but I wanted at the very least try
to see if it would work. At the very least, a cursory glance over the
description of may_copy_tree indicates that allowing this is, in
principle, not completely out of left field.

Franklin "Snaipe" Mathieu (1):
  fs: let open_tree open mounts from another namespace

 fs/namespace.c                                | 26 +++++++++++--------
 include/uapi/linux/mount.h                    |  1 +
 tools/include/uapi/linux/mount.h              |  1 +
 .../trace/beauty/include/uapi/linux/mount.h   |  1 +
 4 files changed, 18 insertions(+), 11 deletions(-)

-- 
2.51.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] fs: let open_tree open mounts from another namespace
  2025-10-31 23:01 open_tree, and bind-mounting directories across mount namespaces Snaipe
  2025-11-01 15:21 ` Franklin Snaipe Mathieu
@ 2025-11-01 15:21 ` Franklin Snaipe Mathieu
  2025-11-05 12:05 ` open_tree, and bind-mounting directories across mount namespaces Christian Brauner
  2 siblings, 0 replies; 9+ messages in thread
From: Franklin Snaipe Mathieu @ 2025-11-01 15:21 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Viro, Franklin "Snaipe" Mathieu

From: "Franklin \"Snaipe\" Mathieu" <me@snai.pe>

This commit adds the OPEN_TREE_CROSSNS flag, which relaxes the
requirement for the passed mount to be in the same mount namespace as
the current task. Without that flag, the following sequence does not
work:

    int fd = open("/tmp", O_PATH);
    unshare(CLONE_NEWNS);
    // returns -EINVAL
    open_tree(fd, "", OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE);

This is because __do_loopback calls may_copy_tree, which ultimately
rejects paths whose mount exist in a different mount namespace than the
caller.

With OPEN_TREE_CROSSNS, the same sequence works, and it becomes possible
for the new mount namespace to bind-mount trees by file descriptors
opened in a different mount namespace.

Currently, this new flag is only valid when used with OPEN_TREE_CLONE.

Signed-off-by: Franklin "Snaipe" Mathieu <me@snai.pe>
---
 fs/namespace.c                                | 26 +++++++++++--------
 include/uapi/linux/mount.h                    |  1 +
 tools/include/uapi/linux/mount.h              |  1 +
 .../trace/beauty/include/uapi/linux/mount.h   |  1 +
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d82910f33dc4..49239fa4d276 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2928,7 +2928,7 @@ static int do_change_type(const struct path *path, int ms_flags)
  *
  * Returns true if the mount tree can be copied, false otherwise.
  */
-static inline bool may_copy_tree(const struct path *path)
+static inline bool may_copy_tree(const struct path *path, bool cross_ns)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	const struct dentry_operations *d_op;
@@ -2946,18 +2946,21 @@ static inline bool may_copy_tree(const struct path *path)
 	if (!is_mounted(path->mnt))
 		return false;
 
-	return check_anonymous_mnt(mnt);
+	if (check_anonymous_mnt(mnt))
+		return true;
+
+	return cross_ns;
 }
 
 
-static struct mount *__do_loopback(const struct path *old_path, int recurse)
+static struct mount *__do_loopback(const struct path *old_path, int recurse, bool cross_ns)
 {
 	struct mount *old = real_mount(old_path->mnt);
 
 	if (IS_MNT_UNBINDABLE(old))
 		return ERR_PTR(-EINVAL);
 
-	if (!may_copy_tree(old_path))
+	if (!may_copy_tree(old_path, cross_ns))
 		return ERR_PTR(-EINVAL);
 
 	if (!recurse && __has_locked_children(old, old_path->dentry))
@@ -2994,7 +2997,7 @@ static int do_loopback(const struct path *path, const char *old_name,
 	if (!check_mnt(mp.parent))
 		return -EINVAL;
 
-	mnt = __do_loopback(&old_path, recurse);
+	mnt = __do_loopback(&old_path, recurse, false);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
@@ -3007,7 +3010,7 @@ static int do_loopback(const struct path *path, const char *old_name,
 	return err;
 }
 
-static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive)
+static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive, bool cross_ns)
 {
 	struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns;
 	struct user_namespace *user_ns = mnt_ns->user_ns;
@@ -3032,7 +3035,7 @@ static struct mnt_namespace *get_detached_copy(const struct path *path, bool rec
 			ns->seq_origin = src_mnt_ns->ns.ns_id;
 	}
 
-	mnt = __do_loopback(path, recursive);
+	mnt = __do_loopback(path, recursive, cross_ns);
 	if (IS_ERR(mnt)) {
 		emptied_ns = ns;
 		return ERR_CAST(mnt);
@@ -3046,9 +3049,9 @@ static struct mnt_namespace *get_detached_copy(const struct path *path, bool rec
 	return ns;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+static struct file *open_detached_copy(struct path *path, bool recursive, bool cross_ns)
 {
-	struct mnt_namespace *ns = get_detached_copy(path, recursive);
+	struct mnt_namespace *ns = get_detached_copy(path, recursive, cross_ns);
 	struct file *file;
 
 	if (IS_ERR(ns))
@@ -3070,12 +3073,13 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
 	struct path path __free(path_put) = {};
 	int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW;
 	bool detached = flags & OPEN_TREE_CLONE;
+	bool cross_ns = flags & OPEN_TREE_CROSSNS;
 
 	BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC);
 
 	if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE |
 		      AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE |
-		      OPEN_TREE_CLOEXEC))
+		      OPEN_TREE_CLOEXEC | OPEN_TREE_CROSSNS))
 		return ERR_PTR(-EINVAL);
 
 	if ((flags & (AT_RECURSIVE | OPEN_TREE_CLONE)) == AT_RECURSIVE)
@@ -3096,7 +3100,7 @@ static struct file *vfs_open_tree(int dfd, const char __user *filename, unsigned
 		return ERR_PTR(ret);
 
 	if (detached)
-		return open_detached_copy(&path, flags & AT_RECURSIVE);
+		return open_detached_copy(&path, flags & AT_RECURSIVE, cross_ns);
 
 	return dentry_open(&path, O_PATH, current_cred());
 }
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 7fa67c2031a5..2b415115a651 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -62,6 +62,7 @@
  * open_tree() flags.
  */
 #define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
+#define OPEN_TREE_CROSSNS	2		/* Allow mounts of trees from other namespaces */
 #define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
 
 /*
diff --git a/tools/include/uapi/linux/mount.h b/tools/include/uapi/linux/mount.h
index 7fa67c2031a5..2b415115a651 100644
--- a/tools/include/uapi/linux/mount.h
+++ b/tools/include/uapi/linux/mount.h
@@ -62,6 +62,7 @@
  * open_tree() flags.
  */
 #define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
+#define OPEN_TREE_CROSSNS	2		/* Allow mounts of trees from other namespaces */
 #define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
 
 /*
diff --git a/tools/perf/trace/beauty/include/uapi/linux/mount.h b/tools/perf/trace/beauty/include/uapi/linux/mount.h
index 7fa67c2031a5..2b415115a651 100644
--- a/tools/perf/trace/beauty/include/uapi/linux/mount.h
+++ b/tools/perf/trace/beauty/include/uapi/linux/mount.h
@@ -62,6 +62,7 @@
  * open_tree() flags.
  */
 #define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
+#define OPEN_TREE_CROSSNS	2		/* Allow mounts of trees from other namespaces */
 #define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
 
 /*
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2025-10-31 23:01 open_tree, and bind-mounting directories across mount namespaces Snaipe
  2025-11-01 15:21 ` Franklin Snaipe Mathieu
  2025-11-01 15:21 ` [PATCH 1/1] fs: let open_tree open mounts from another namespace Franklin Snaipe Mathieu
@ 2025-11-05 12:05 ` Christian Brauner
  2025-11-06 12:59   ` Snaipe
  2 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2025-11-05 12:05 UTC (permalink / raw)
  To: Snaipe; +Cc: linux-fsdevel, Alexander Viro

On Sat, Nov 01, 2025 at 12:01:38AM +0100, Snaipe wrote:
> Hi folks,
> 
> (Disclaimer: I'm not a kernel developer)
> 
> I'm currently playing around with the new mount API, on Linux 6.17.6.
> One of the things I'm trying to do is to write a program that unshares
> its mount namespace and receives a directory file descriptor via an
> unix socket from another program that exists in a different mount
> namespace. The intent is to have a program that has access to data on
> a filesystem that is not normally accessible to other unprivileged
> programs, and have that program give access to select directories by
> opening them with O_PATH and sending the fds over a unix socket.
> 
> One snag I'm currently hitting is that once I call open_tree(fd, "",
> OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE), the syscall returns
> EINVAL; I've bpftraced it back to __do_loopback's may_copy_tree check
> and it looks like it's impossible to do on dentries whose mount
> namespace is different that the current task's mount namespace.
> 
> I'm trying to understand the reasons this was put in place, and what
> it would take to enable the kind of use-case that I have. Would there
> be a security risk to relax this condition with some kind of open_tree
> flag?

In principle it's doable just like I made statmount() and listmount()
allow you to operate across mount namespaces.

If we do this I don't think we need a new flag as in your new example.
We just need open_tree() to support being called on foreign mounts
provided the caller is privileged over the target mount namespace and it
needs a consistent permission model and loads of tests. So no flags
needed imho.

I can start looking into this next week or you can give it your own
shot.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2025-11-05 12:05 ` open_tree, and bind-mounting directories across mount namespaces Christian Brauner
@ 2025-11-06 12:59   ` Snaipe
  2026-01-29 13:39     ` Snaipe
  0 siblings, 1 reply; 9+ messages in thread
From: Snaipe @ 2025-11-06 12:59 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Alexander Viro

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

On Wed, Nov 5, 2025 at 1:05 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Sat, Nov 01, 2025 at 12:01:38AM +0100, Snaipe wrote:
> > Hi folks,
> >
> > (Disclaimer: I'm not a kernel developer)
> >
> > I'm currently playing around with the new mount API, on Linux 6.17.6.
> > One of the things I'm trying to do is to write a program that unshares
> > its mount namespace and receives a directory file descriptor via an
> > unix socket from another program that exists in a different mount
> > namespace. The intent is to have a program that has access to data on
> > a filesystem that is not normally accessible to other unprivileged
> > programs, and have that program give access to select directories by
> > opening them with O_PATH and sending the fds over a unix socket.
> >
> > One snag I'm currently hitting is that once I call open_tree(fd, "",
> > OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE), the syscall returns
> > EINVAL; I've bpftraced it back to __do_loopback's may_copy_tree check
> > and it looks like it's impossible to do on dentries whose mount
> > namespace is different that the current task's mount namespace.
> >
> > I'm trying to understand the reasons this was put in place, and what
> > it would take to enable the kind of use-case that I have. Would there
> > be a security risk to relax this condition with some kind of open_tree
> > flag?
>
> In principle it's doable just like I made statmount() and listmount()
> allow you to operate across mount namespaces.
>
> If we do this I don't think we need a new flag as in your new example.
> We just need open_tree() to support being called on foreign mounts
> provided the caller is privileged over the target mount namespace and it
> needs a consistent permission model and loads of tests. So no flags
> needed imho.

To clarify: is the target mount ns here is the mount ns of the caller
of open_tree, or is it the mount namespace of the specified tree?

The former is what I'd expect already (and should be covered by the
current permission check); the latter would make it very difficult for
a process that has called unshare(CLONE_NEWUSER | CLONE_NEWNS) to
receive file descriptors from a parent process (or processes in other
mount namespaces) and mount them, since it would not hold privileges
over the other process' mount namespace.

I've attached an example program of what I'm looking for: a parent
forking a child, where the child unshares the user and mount ns,
receives a file descriptor (for /tmp) from the parent, and bind-mounts
(onto /mnt) it in its own namespace.

>
> I can start looking into this next week or you can give it your own
> shot.

Thanks Christian; I think you have more context than me to be able to
do something better here. I might give it a shot if I get more time on
this in a couple of weeks if you haven't already by that time.

-- 
Franklin "Snaipe" Mathieu

[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 2542 bytes --]

#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <linux/prctl.h>
#include <sched.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <unistd.h>

#define OPEN_TREE_CROSSFS 2

static int recv_fd(int socket)
{
	char buf[1];
	struct iovec io = {
		.iov_base = buf,
		.iov_len = 1,
	};
	union {
		struct cmsghdr _align;
		char ctrl[CMSG_SPACE(sizeof(int))];
	} u;

	struct msghdr msg = {
		.msg_control = u.ctrl,
		.msg_controllen = sizeof(u.ctrl),
		.msg_iov = &io,
		.msg_iovlen = 1,
	};

	ssize_t recv = recvmsg(socket, &msg, 0);
	if (recv == -1) {
		err(1, "recv_fd: recvmsg");
	}

	struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
	return *((int*) CMSG_DATA(cmsg));
}

static int send_fd(int socket, int fd)
{
	char buf[1] = {0};
	struct iovec io = {
		.iov_base = buf,
		.iov_len = 1,
	};
	union {
		struct cmsghdr _align;
		char ctrl[CMSG_SPACE(sizeof(int))];
	} u;
	memset(&u, 0, sizeof(u));

	struct msghdr msg = {
		.msg_control = u.ctrl,
		.msg_controllen = sizeof(u.ctrl),
		.msg_iov = &io,
		.msg_iovlen = 1,
	};

	struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
	cmsg->cmsg_len = CMSG_LEN(sizeof(int));
	cmsg->cmsg_level = SOL_SOCKET;
	cmsg->cmsg_type = SCM_RIGHTS;
	*((int*) CMSG_DATA(cmsg)) = fd;

	if (sendmsg(socket, &msg, 0) < 0) {
		err(1, "send_fd: sendmsg");
	}
}

int main(int argc, char *argv[])
{
	int sockets[2];
	if (socketpair(AF_UNIX, SOCK_STREAM, 0, sockets) == -1) {
		err(1, "socketpair");
	}

	pid_t pid = fork();
	if (pid == -1) {
		err(1, "fork");
	}

	if (pid) {
		int fd = open("/tmp", O_PATH|O_DIRECTORY, 0);
		if (fd == -1) {
			err(1, "open");
		}
		send_fd(sockets[0], fd);

		if (waitpid(pid, NULL, 0) == -1) {
			err(1, "waitpid");
		}
		return 0;
	}

	if (prctl(PR_SET_PDEATHSIG, SIGKILL) == -1) {
		err(1, "prctl");
	}
	if (unshare(CLONE_NEWUSER|CLONE_NEWNS) == -1) {
		err(1, "unshare");
	}

	int flags = OPEN_TREE_CLONE|AT_EMPTY_PATH|AT_RECURSIVE;

	const char *crossfs = getenv("CROSSFS");
	if (crossfs && !strcmp(crossfs, "1")) {
		flags |= OPEN_TREE_CROSSFS;
	}

	int fd1 = recv_fd(sockets[1]);
	int fd2 = open_tree(fd1, "", flags);
	if (fd2 == -1) {
		err(1, "open_tree");
	}

	if (move_mount(fd2, "", -1, "/mnt", MOVE_MOUNT_F_EMPTY_PATH) == -1) {
		err(1, "move_mount");
	}

	if (close_range(3, (unsigned)-1, 0) == -1) {
		err(1, "close_range");
	}

	if (argc == 1) {
		execlp("sh", "-sh", NULL);
		err(1, "execlp sh");
	} else {
		execvp(argv[1], &argv[1]);
		err(1, "execvp %s", argv[1]);
	}
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2025-11-06 12:59   ` Snaipe
@ 2026-01-29 13:39     ` Snaipe
  2026-01-29 14:54       ` Christian Brauner
  0 siblings, 1 reply; 9+ messages in thread
From: Snaipe @ 2026-01-29 13:39 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Alexander Viro

Hi Christian,

I have time to look at this again. I'm however unclear on the
permission model that should be applicable here.

My overarching motivation is to be able to have a process in a
user+mount namespace pass file descriptors to another process in a
different user+mount namespace, which then bind-mounts them. It seems
to me that the only real checks here are that 1) the file descriptor
points to a tree that is still mounted and 2) the caller has
CAP_SYS_ADMIN in the user namespace that owns the mount namespace in
which the caller operates, and both checks seem to be effective as of
today.

It sounds like may_copy_tree should just be changed to this:

> @@ -2946,18 +2946,21 @@ static inline bool may_copy_tree(const struct path *path)
>         if (!is_mounted(path->mnt))
>                 return false;
>
> -       return check_anonymous_mnt(mnt);
> +       return true;
>  }

But the above worries me, because I do not think I understand enough
may_copy_tree to warrant the deletion of check_anonymous_mnt, and the
reason why the check is this way in the first place.

Any advice would be appreciated.

--
Franklin "Snaipe" Mathieu
🝰 https://snai.pe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2026-01-29 13:39     ` Snaipe
@ 2026-01-29 14:54       ` Christian Brauner
  2026-01-29 19:14         ` Snaipe
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2026-01-29 14:54 UTC (permalink / raw)
  To: Snaipe; +Cc: linux-fsdevel, Alexander Viro

On Thu, Jan 29, 2026 at 02:39:19PM +0100, Snaipe wrote:
> Hi Christian,
> 
> I have time to look at this again. I'm however unclear on the
> permission model that should be applicable here.
> 
> My overarching motivation is to be able to have a process in a
> user+mount namespace pass file descriptors to another process in a
> different user+mount namespace, which then bind-mounts them. It seems
> to me that the only real checks here are that 1) the file descriptor
> points to a tree that is still mounted and 2) the caller has
> CAP_SYS_ADMIN in the user namespace that owns the mount namespace in
> which the caller operates, and both checks seem to be effective as of
> today.
> 
> It sounds like may_copy_tree should just be changed to this:
> 
> > @@ -2946,18 +2946,21 @@ static inline bool may_copy_tree(const struct path *path)
> >         if (!is_mounted(path->mnt))
> >                 return false;
> >
> > -       return check_anonymous_mnt(mnt);
> > +       return true;
> >  }
> 
> But the above worries me, because I do not think I understand enough
> may_copy_tree to warrant the deletion of check_anonymous_mnt, and the
> reason why the check is this way in the first place.
> 
> Any advice would be appreciated.

I think I might have even left a comment somewhere in the code...
The gist is something like:

diff --git a/fs/namespace.c b/fs/namespace.c
index ad35f8c961ef..e78aff6b3bf7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -961,8 +961,7 @@ static inline bool check_anonymous_mnt(struct mount *mnt)
        if (!is_anon_ns(mnt->mnt_ns))
                return false;

-       seq = mnt->mnt_ns->seq_origin;
-       return !seq || (seq == current->nsproxy->mnt_ns->ns.ns_id);
+       return ns_capable_noaudit(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN);
 }

where we allow creating detached mounts or mounting on top of a detached
mount provided the caller is privileged over the owning userns of the
mount namespace.

But then the may_mount() check would also have to be changed so that a
caller unprivileged in their current mount namespace can still
created/attach detached mounts in anonymous mount namespaces they are
privileged over.

Even the check_mnt() checks should be relaxed for move_mount() so that
you can attach a detached mount in a mount namespace that you have
privilege over. I'd need to see it in patch form though.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2026-01-29 14:54       ` Christian Brauner
@ 2026-01-29 19:14         ` Snaipe
  2026-01-30 14:55           ` Snaipe
  0 siblings, 1 reply; 9+ messages in thread
From: Snaipe @ 2026-01-29 19:14 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Alexander Viro

On Thu, Jan 29, 2026 at 3:54 PM Christian Brauner <brauner@kernel.org> wrote:
> I think I might have even left a comment somewhere in the code...
> The gist is something like:
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ad35f8c961ef..e78aff6b3bf7 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -961,8 +961,7 @@ static inline bool check_anonymous_mnt(struct mount *mnt)
>         if (!is_anon_ns(mnt->mnt_ns))
>                 return false;
>
> -       seq = mnt->mnt_ns->seq_origin;
> -       return !seq || (seq == current->nsproxy->mnt_ns->ns.ns_id);
> +       return ns_capable_noaudit(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN);
>  }
>
> where we allow creating detached mounts or mounting on top of a detached
> mount provided the caller is privileged over the owning userns of the
> mount namespace.
>
> But then the may_mount() check would also have to be changed so that a
> caller unprivileged in their current mount namespace can still
> created/attach detached mounts in anonymous mount namespaces they are
> privileged over.
>
> Even the check_mnt() checks should be relaxed for move_mount() so that
> you can attach a detached mount in a mount namespace that you have
> privilege over. I'd need to see it in patch form though.

I was a bit confused initially but I think I'm starting to see the picture.

In my original attempt, process A (privileged in user ns A and mount
ns A) would open a file descriptor, send it to process B (privileged
in user ns B and mount ns B, but not A), which would then call
open_tree followed by move_mount. The issue with this approach is that
the file descriptor's path from which we get the detached copy is
still in mount ns A, over which process B is not privileged over.

If I understand you correctly, you're saying instead that process A
should be the one doing open_tree to get a detached tree (which it can
since it is privileged over user ns A), send it to process B, which
then calls move_mount. Today, this operation fails with EINVAL, but
the point would be to relax the checks in move_mount so that processes
can mount any detached trees in mount namespaces they are privileged
on, even if said detached tree originated from a mount namespace they
are not privileged on.

Am I interpreting your point correctly?

--
Franklin "Snaipe" Mathieu
🝰 https://snai.pe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: open_tree, and bind-mounting directories across mount namespaces
  2026-01-29 19:14         ` Snaipe
@ 2026-01-30 14:55           ` Snaipe
  0 siblings, 0 replies; 9+ messages in thread
From: Snaipe @ 2026-01-30 14:55 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Alexander Viro

On Thu, Jan 29, 2026 at 8:14 PM Snaipe <me@snai.pe> wrote:
> I was a bit confused initially but I think I'm starting to see the picture.
>
> In my original attempt, process A (privileged in user ns A and mount
> ns A) would open a file descriptor, send it to process B (privileged
> in user ns B and mount ns B, but not A), which would then call
> open_tree followed by move_mount. The issue with this approach is that
> the file descriptor's path from which we get the detached copy is
> still in mount ns A, over which process B is not privileged over.
>
> If I understand you correctly, you're saying instead that process A
> should be the one doing open_tree to get a detached tree (which it can
> since it is privileged over user ns A), send it to process B, which
> then calls move_mount. Today, this operation fails with EINVAL, but
> the point would be to relax the checks in move_mount so that processes
> can mount any detached trees in mount namespaces they are privileged
> on, even if said detached tree originated from a mount namespace they
> are not privileged on.
>
> Am I interpreting your point correctly?
>

I ended up adjusting my prototype in that direction -- the good news
is that it already works as expected: if process A calls open_tree and
sends that file descriptor to process B, then process B can move_tree
the result. I'm not sure since when, but at least right now it works
on my machine which runs 6.12.63.

The not-that-good news is that I'm getting bitten with close semantics
weirdness. One thing I need is for process A to send a file descriptor
that has been flock'ed -- typically because this guards the tree being
sent against deletion by a local agent. Since open_tree returns a file
descriptor that is essentially O_PATH, it means I can't flock it, so
instead, I'm reopening the root of the tree with O_RDONLY. So far so
good:

    // on the sending end
    int fd1 = open_tree(...)
    int fd2 = openat(fd1, ".", O_RDONLY);
    flock(fd2, LOCK_EX);
    send_fd(fd2);

    // on the receiving end
    int fd = recv_fd();
    move_mount(fd, "", -1, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);

Now, the issue is that if the sender closes fd1 after sending fd2, and
the receiver calls move_mount after fd1 has been closed, then it will
return EINVAL, while it'll work perfectly if it manages to win the
race and call move_mount before fd1 has been closed.

I haven't had time to debug this properly yet (although I'm going to
look into this) but my guess is that when fd1 is closed, the detached
tree is marked unmounted even if there are open file descriptors
pointing to content within?

-- 
Franklin "Snaipe" Mathieu
🝰 https://snai.pe

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-01-30 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 23:01 open_tree, and bind-mounting directories across mount namespaces Snaipe
2025-11-01 15:21 ` Franklin Snaipe Mathieu
2025-11-01 15:21 ` [PATCH 1/1] fs: let open_tree open mounts from another namespace Franklin Snaipe Mathieu
2025-11-05 12:05 ` open_tree, and bind-mounting directories across mount namespaces Christian Brauner
2025-11-06 12:59   ` Snaipe
2026-01-29 13:39     ` Snaipe
2026-01-29 14:54       ` Christian Brauner
2026-01-29 19:14         ` Snaipe
2026-01-30 14:55           ` Snaipe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox