linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities
@ 2024-08-06 16:02 Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared() Christian Brauner
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

(Preface because I've been panick-approached by people at conference
 when we discussed this before: overmounting any global procfs files
 such as /proc/status remains unaffected and is an existing and
 supported use-case.)

It is currently possible to mount on top of various ephemeral entities
in procfs. This specifically includes magic links. To recap, magic links
are links of the form /proc/<pid>/fd/<nr>. They serve as references to
a target file and during path lookup they cause a jump to the target
path. Such magic links disappear if the corresponding file descriptor is
closed.

Currently it is possible to overmount such magic links:

int fd = open("/mnt/foo", O_RDONLY);
sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
mount("/mnt/bar", path, "", MS_BIND, 0);

Arguably, this is nonsensical and is mostly interesting for an attacker
that wants to somehow trick a process into e.g., reopening something
that they didn't intend to reopen or to hide a malicious file
descriptor.

But also it risks leaking mounts for long-running processes. When
overmounting a magic link like above, the mount will not be detached
when the file descriptor is closed. Only the target mountpoint will
disappear. Which has the consequence of making it impossible to unmount
that mount afterwards. So the mount will stick around until the process
exits and the /proc/<pid>/ directory is cleaned up during
proc_flush_pid() when the dentries are pruned and invalidated.

That in turn means it's possible for a program to accidentally leak
mounts and it's also possible to make a task leak mounts without it's
knowledge if the attacker just keeps overmounting things under
/proc/<pid>/fd/<nr>.

I think it's wrong to try and fix this by us starting to play games with
close() or somewhere else to undo these mounts when the file descriptor
is closed. The fact that we allow overmounting of such magic links is
simply a bug and one that we need to fix.

Similar things can be said about entries under fdinfo/ and map_files/ so
those are restricted as well.

I have a further more aggressive patch that gets out the big hammer and
makes everything under /proc/<pid>/*, as well as immediate symlinks such
as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
if we can get away with it. It's only useful to hide exploits such as in [1].

And again, overmounting of any global procfs files remains unaffected
and is an existing and supported use-case.

Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]

// Note that repro uses the traditional way of just mounting over
// /proc/<pid>/fd/<nr>. This could also all be achieved just based on
// file descriptors using move_mount(). So /proc/<pid>/fd/<nr> isn't the
// only entry vector here. It's also possible to e.g., mount directly
// onto /proc/<pid>/map_files/* without going over /proc/<pid>/fd/<nr>.
int main(int argc, char *argv[])
{
        char path[PATH_MAX];

        creat("/mnt/foo", 0777);
        creat("/mnt/bar", 0777);

        /*
         * For illustration use a bunch of file descriptors in the upper
         * range that are unused.
         */
        for (int i = 10000; i >= 256; i--) {
                printf("I'm: /proc/%d/\n", getpid());

                int fd2 = open("/mnt/foo", O_RDONLY);
                if (fd2 < 0) {
                        printf("%m - Failed to open\n");
                        _exit(1);
                }

                int newfd = dup2(fd2, i);
                if (newfd < 0) {
                        printf("%m - Failed to dup\n");
                        _exit(1);
                }
                close(fd2);

                sprintf(path, "/proc/%d/fd/%d", getpid(), newfd);
                int fd = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
                if (fd < 0) {
                        printf("%m - Failed to open\n");
                        _exit(3);
                }

                sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
                printf("Mounting on top of %s\n", path);
                if (mount("/mnt/bar", path, "", MS_BIND, 0)) {
                        printf("%m - Failed to mount\n");
                        _exit(4);
                }

                close(newfd);
                close(fd2);
        }

        /*
         * Give some time to look at things. The mounts now linger until
         * the process exits.
         */
        sleep(10000);
        _exit(0);
}

Co-developed-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240731-work-procfs-e82dd889b773


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared()
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-06 17:06   ` Linus Torvalds
  2024-08-06 16:02 ` [PATCH RFC 2/6] proc: proc_readfdinfo() -> proc_fdinfo_iterate_shared() Christian Brauner
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Give the method to iterate through the fd directory a better name.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/fd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 586bbc84ca04..41bc75d5060c 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -312,14 +312,14 @@ static int proc_readfd_count(struct inode *inode, loff_t *count)
 	return 0;
 }
 
-static int proc_readfd(struct file *file, struct dir_context *ctx)
+static int proc_fd_iterate_shared(struct file *file, struct dir_context *ctx)
 {
 	return proc_readfd_common(file, ctx, proc_fd_instantiate);
 }
 
 const struct file_operations proc_fd_operations = {
 	.read		= generic_read_dir,
-	.iterate_shared	= proc_readfd,
+	.iterate_shared	= proc_fd_iterate_shared,
 	.llseek		= generic_file_llseek,
 };
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 2/6] proc: proc_readfdinfo() -> proc_fdinfo_iterate_shared()
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared() Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 3/6] proc: add proc_splice_unmountable() Christian Brauner
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Give the method to iterate through the fdinfo directory a better name.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/fd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 41bc75d5060c..ab243caf1b71 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -407,7 +407,7 @@ proc_lookupfdinfo(struct inode *dir, struct dentry *dentry, unsigned int flags)
 	return proc_lookupfd_common(dir, dentry, proc_fdinfo_instantiate);
 }
 
-static int proc_readfdinfo(struct file *file, struct dir_context *ctx)
+static int proc_fdinfo_iterate_shared(struct file *file, struct dir_context *ctx)
 {
 	return proc_readfd_common(file, ctx,
 				  proc_fdinfo_instantiate);
@@ -421,6 +421,6 @@ const struct inode_operations proc_fdinfo_inode_operations = {
 
 const struct file_operations proc_fdinfo_operations = {
 	.read		= generic_read_dir,
-	.iterate_shared	= proc_readfdinfo,
+	.iterate_shared	= proc_fdinfo_iterate_shared,
 	.llseek		= generic_file_llseek,
 };

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 3/6] proc: add proc_splice_unmountable()
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared() Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 2/6] proc: proc_readfdinfo() -> proc_fdinfo_iterate_shared() Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 4/6] proc: block mounting on top of /proc/<pid>/map_files/* Christian Brauner
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Add a tiny procfs helper to splice a dentry that cannot be mounted upon.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/internal.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index a8a8576d8592..9e3f25e4c188 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -349,3 +349,16 @@ static inline void pde_force_lookup(struct proc_dir_entry *pde)
 	/* /proc/net/ entries can be changed under us by setns(CLONE_NEWNET) */
 	pde->proc_dops = &proc_net_dentry_ops;
 }
+
+/*
+ * Add a new procfs dentry that can't serve as a mountpoint. That should
+ * encompass anything that is ephemeral and can just disappear while the
+ * process is still around.
+ */
+static inline struct dentry *proc_splice_unmountable(struct inode *inode,
+		struct dentry *dentry, const struct dentry_operations *d_ops)
+{
+	d_set_d_op(dentry, d_ops);
+	dont_mount(dentry);
+	return d_splice_alias(inode, dentry);
+}

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 4/6] proc: block mounting on top of /proc/<pid>/map_files/*
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (2 preceding siblings ...)
  2024-08-06 16:02 ` [PATCH RFC 3/6] proc: add proc_splice_unmountable() Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 5/6] proc: block mounting on top of /proc/<pid>/fd/* Christian Brauner
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Entries under /proc/<pid>/map_files/* are ephemeral and may go away
before the process dies. As such allowing them to be used as mount
points creates the ability to leak mounts that linger until the process
dies with no ability to unmount them until then. Don't allow using them
as mountpoints.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 72a1acd03675..fce3d377b826 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2276,8 +2276,8 @@ proc_map_files_instantiate(struct dentry *dentry,
 	inode->i_op = &proc_map_files_link_inode_operations;
 	inode->i_size = 64;
 
-	d_set_d_op(dentry, &tid_map_files_dentry_operations);
-	return d_splice_alias(inode, dentry);
+	return proc_splice_unmountable(inode, dentry,
+				       &tid_map_files_dentry_operations);
 }
 
 static struct dentry *proc_map_files_lookup(struct inode *dir,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 5/6] proc: block mounting on top of /proc/<pid>/fd/*
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (3 preceding siblings ...)
  2024-08-06 16:02 ` [PATCH RFC 4/6] proc: block mounting on top of /proc/<pid>/map_files/* Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-06 16:02 ` [PATCH RFC 6/6] proc: block mounting on top of /proc/<pid>/fdinfo/* Christian Brauner
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Entries under /proc/<pid>/fd/* are ephemeral and may go away before the
process dies. As such allowing them to be used as mount points creates
the ability to leak mounts that linger until the process dies with no
ability to unmount them until then. Don't allow using them as
mountpoints.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/fd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index ab243caf1b71..f6b7344b9b2e 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -220,8 +220,8 @@ static struct dentry *proc_fd_instantiate(struct dentry *dentry,
 	ei->op.proc_get_link = proc_fd_link;
 	tid_fd_update_inode(task, inode, data->mode);
 
-	d_set_d_op(dentry, &tid_fd_dentry_operations);
-	return d_splice_alias(inode, dentry);
+	return proc_splice_unmountable(inode, dentry,
+				       &tid_fd_dentry_operations);
 }
 
 static struct dentry *proc_lookupfd_common(struct inode *dir,

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 6/6] proc: block mounting on top of /proc/<pid>/fdinfo/*
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (4 preceding siblings ...)
  2024-08-06 16:02 ` [PATCH RFC 5/6] proc: block mounting on top of /proc/<pid>/fd/* Christian Brauner
@ 2024-08-06 16:02 ` Christian Brauner
  2024-08-07 14:33 ` [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Josef Bacik
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-06 16:02 UTC (permalink / raw)
  To: Linus Torvalds, linux-fsdevel
  Cc: Alexander Viro, Jan Kara, Aleksa Sarai, Christian Brauner

Entries under /proc/<pid>/fdinfo/* are ephemeral and may go away before
the process dies. As such allowing them to be used as mount points
creates the ability to leak mounts that linger until the process dies
with no ability to unmount them until then. Don't allow using them as
mountpoints.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/proc/fd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index f6b7344b9b2e..e9ac955ca9f3 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -397,8 +397,8 @@ static struct dentry *proc_fdinfo_instantiate(struct dentry *dentry,
 	inode->i_fop = &proc_fdinfo_file_operations;
 	tid_fd_update_inode(task, inode, 0);
 
-	d_set_d_op(dentry, &tid_fd_dentry_operations);
-	return d_splice_alias(inode, dentry);
+	return proc_splice_unmountable(inode, dentry,
+				       &tid_fd_dentry_operations);
 }
 
 static struct dentry *

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared()
  2024-08-06 16:02 ` [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared() Christian Brauner
@ 2024-08-06 17:06   ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2024-08-06 17:06 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Alexander Viro, Jan Kara, Aleksa Sarai

On Tue, 6 Aug 2024 at 09:02, Christian Brauner <brauner@kernel.org> wrote:
>
> -static int proc_readfd(struct file *file, struct dir_context *ctx)
> +static int proc_fd_iterate_shared(struct file *file, struct dir_context *ctx)

I think the "proc_fd_iterate" part is great.

The "shared" part I'm not super-excited about, simply because we
finally got rid of the non-shared 'iterate' function entirely last
year, and so the "_shared" part is pure historical naming.

In fact, I was hoping we'd do an automated rename at some point. Maybe
not to "iterate" (too generic a name), but something like
"iterate_dir" or something would avoid the now pointless "shared"
thing.

                 Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (5 preceding siblings ...)
  2024-08-06 16:02 ` [PATCH RFC 6/6] proc: block mounting on top of /proc/<pid>/fdinfo/* Christian Brauner
@ 2024-08-07 14:33 ` Josef Bacik
  2025-09-30  8:53 ` Petr Vorel
  2025-09-30 15:51 ` Aleksa Sarai
  8 siblings, 0 replies; 11+ messages in thread
From: Josef Bacik @ 2024-08-07 14:33 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, linux-fsdevel, Alexander Viro, Jan Kara,
	Aleksa Sarai

On Tue, Aug 06, 2024 at 06:02:26PM +0200, Christian Brauner wrote:
> (Preface because I've been panick-approached by people at conference
>  when we discussed this before: overmounting any global procfs files
>  such as /proc/status remains unaffected and is an existing and
>  supported use-case.)
> 
> It is currently possible to mount on top of various ephemeral entities
> in procfs. This specifically includes magic links. To recap, magic links
> are links of the form /proc/<pid>/fd/<nr>. They serve as references to
> a target file and during path lookup they cause a jump to the target
> path. Such magic links disappear if the corresponding file descriptor is
> closed.
> 
> Currently it is possible to overmount such magic links:
> 
> int fd = open("/mnt/foo", O_RDONLY);
> sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
> int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
> mount("/mnt/bar", path, "", MS_BIND, 0);
> 
> Arguably, this is nonsensical and is mostly interesting for an attacker
> that wants to somehow trick a process into e.g., reopening something
> that they didn't intend to reopen or to hide a malicious file
> descriptor.
> 
> But also it risks leaking mounts for long-running processes. When
> overmounting a magic link like above, the mount will not be detached
> when the file descriptor is closed. Only the target mountpoint will
> disappear. Which has the consequence of making it impossible to unmount
> that mount afterwards. So the mount will stick around until the process
> exits and the /proc/<pid>/ directory is cleaned up during
> proc_flush_pid() when the dentries are pruned and invalidated.
> 
> That in turn means it's possible for a program to accidentally leak
> mounts and it's also possible to make a task leak mounts without it's
> knowledge if the attacker just keeps overmounting things under
> /proc/<pid>/fd/<nr>.
> 
> I think it's wrong to try and fix this by us starting to play games with
> close() or somewhere else to undo these mounts when the file descriptor
> is closed. The fact that we allow overmounting of such magic links is
> simply a bug and one that we need to fix.
> 
> Similar things can be said about entries under fdinfo/ and map_files/ so
> those are restricted as well.
> 
> I have a further more aggressive patch that gets out the big hammer and
> makes everything under /proc/<pid>/*, as well as immediate symlinks such
> as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
> into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
> if we can get away with it. It's only useful to hide exploits such as in [1].
> 
> And again, overmounting of any global procfs files remains unaffected
> and is an existing and supported use-case.
> 
> Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]
> 
> // Note that repro uses the traditional way of just mounting over
> // /proc/<pid>/fd/<nr>. This could also all be achieved just based on
> // file descriptors using move_mount(). So /proc/<pid>/fd/<nr> isn't the
> // only entry vector here. It's also possible to e.g., mount directly
> // onto /proc/<pid>/map_files/* without going over /proc/<pid>/fd/<nr>.
> int main(int argc, char *argv[])
> {
>         char path[PATH_MAX];
> 
>         creat("/mnt/foo", 0777);
>         creat("/mnt/bar", 0777);
> 
>         /*
>          * For illustration use a bunch of file descriptors in the upper
>          * range that are unused.
>          */
>         for (int i = 10000; i >= 256; i--) {
>                 printf("I'm: /proc/%d/\n", getpid());
> 
>                 int fd2 = open("/mnt/foo", O_RDONLY);
>                 if (fd2 < 0) {
>                         printf("%m - Failed to open\n");
>                         _exit(1);
>                 }
> 
>                 int newfd = dup2(fd2, i);
>                 if (newfd < 0) {
>                         printf("%m - Failed to dup\n");
>                         _exit(1);
>                 }
>                 close(fd2);
> 
>                 sprintf(path, "/proc/%d/fd/%d", getpid(), newfd);
>                 int fd = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
>                 if (fd < 0) {
>                         printf("%m - Failed to open\n");
>                         _exit(3);
>                 }
> 
>                 sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
>                 printf("Mounting on top of %s\n", path);
>                 if (mount("/mnt/bar", path, "", MS_BIND, 0)) {
>                         printf("%m - Failed to mount\n");
>                         _exit(4);
>                 }
> 
>                 close(newfd);
>                 close(fd2);
>         }
> 
>         /*
>          * Give some time to look at things. The mounts now linger until
>          * the process exits.
>          */
>         sleep(10000);
>         _exit(0);
> }
> 
> Co-developed-by: Aleksa Sarai <cyphar@cyphar.com>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

I'm always down to restrict /proc, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (6 preceding siblings ...)
  2024-08-07 14:33 ` [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Josef Bacik
@ 2025-09-30  8:53 ` Petr Vorel
  2025-09-30 15:51 ` Aleksa Sarai
  8 siblings, 0 replies; 11+ messages in thread
From: Petr Vorel @ 2025-09-30  8:53 UTC (permalink / raw)
  To: brauner, Linus Torvalds, linux-fsdevel
  Cc: cyphar, jack, viro, Greg Kroah-Hartman, Sasha Levin, Kees Cook,
	Josef Bacik

From: Christian Brauner <brauner@kernel.org>

Hi Christian, all,

> (Preface because I've been panick-approached by people at conference
>  when we discussed this before: overmounting any global procfs files
>  such as /proc/status remains unaffected and is an existing and
>  supported use-case.)

> It is currently possible to mount on top of various ephemeral entities
> in procfs. This specifically includes magic links. To recap, magic links
> are links of the form /proc/<pid>/fd/<nr>. They serve as references to
> a target file and during path lookup they cause a jump to the target
> path. Such magic links disappear if the corresponding file descriptor is
> closed.

> Currently it is possible to overmount such magic links:

> int fd = open("/mnt/foo", O_RDONLY);
> sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
> int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
> mount("/mnt/bar", path, "", MS_BIND, 0);

> Arguably, this is nonsensical and is mostly interesting for an attacker
> that wants to somehow trick a process into e.g., reopening something
> that they didn't intend to reopen or to hide a malicious file
> descriptor.

> But also it risks leaking mounts for long-running processes. When
> overmounting a magic link like above, the mount will not be detached
> when the file descriptor is closed. Only the target mountpoint will
> disappear. Which has the consequence of making it impossible to unmount
> that mount afterwards. So the mount will stick around until the process
> exits and the /proc/<pid>/ directory is cleaned up during
> proc_flush_pid() when the dentries are pruned and invalidated.

> That in turn means it's possible for a program to accidentally leak
> mounts and it's also possible to make a task leak mounts without it's
> knowledge if the attacker just keeps overmounting things under
> /proc/<pid>/fd/<nr>.

> I think it's wrong to try and fix this by us starting to play games with
> close() or somewhere else to undo these mounts when the file descriptor
> is closed. The fact that we allow overmounting of such magic links is
> simply a bug and one that we need to fix.

> Similar things can be said about entries under fdinfo/ and map_files/ so
> those are restricted as well.

> I have a further more aggressive patch that gets out the big hammer and
> makes everything under /proc/<pid>/*, as well as immediate symlinks such
> as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
> into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
> if we can get away with it. It's only useful to hide exploits such as in [1].

> And again, overmounting of any global procfs files remains unaffected
> and is an existing and supported use-case.

> Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]

this is fixing a security issue, right? Wouldn't it be worth to backport these
commits to active stable/LTS kernels. I guess it was considered as a new feature
that's why it was not backported (looking at 6.11, 6.6 and 6.1).

Kind regards,
Petr

...

> Co-developed-by: Aleksa Sarai <cyphar@cyphar.com>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities
  2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
                   ` (7 preceding siblings ...)
  2025-09-30  8:53 ` Petr Vorel
@ 2025-09-30 15:51 ` Aleksa Sarai
  8 siblings, 0 replies; 11+ messages in thread
From: Aleksa Sarai @ 2025-09-30 15:51 UTC (permalink / raw)
  To: Petr Vorel
  Cc: brauner, Linus Torvalds, linux-fsdevel, jack, viro,
	Greg Kroah-Hartman, Sasha Levin, Kees Cook, Josef Bacik

[-- Attachment #1: Type: text/plain, Size: 4203 bytes --]

On 2025-09-30, Petr Vorel <pvorel@suse.cz> wrote:
> From: Christian Brauner <brauner@kernel.org>
> 
> Hi Christian, all,
> 
> > (Preface because I've been panick-approached by people at conference
> >  when we discussed this before: overmounting any global procfs files
> >  such as /proc/status remains unaffected and is an existing and
> >  supported use-case.)
> 
> > It is currently possible to mount on top of various ephemeral entities
> > in procfs. This specifically includes magic links. To recap, magic links
> > are links of the form /proc/<pid>/fd/<nr>. They serve as references to
> > a target file and during path lookup they cause a jump to the target
> > path. Such magic links disappear if the corresponding file descriptor is
> > closed.
> 
> > Currently it is possible to overmount such magic links:
> 
> > int fd = open("/mnt/foo", O_RDONLY);
> > sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
> > int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
> > mount("/mnt/bar", path, "", MS_BIND, 0);
> 
> > Arguably, this is nonsensical and is mostly interesting for an attacker
> > that wants to somehow trick a process into e.g., reopening something
> > that they didn't intend to reopen or to hide a malicious file
> > descriptor.
> 
> > But also it risks leaking mounts for long-running processes. When
> > overmounting a magic link like above, the mount will not be detached
> > when the file descriptor is closed. Only the target mountpoint will
> > disappear. Which has the consequence of making it impossible to unmount
> > that mount afterwards. So the mount will stick around until the process
> > exits and the /proc/<pid>/ directory is cleaned up during
> > proc_flush_pid() when the dentries are pruned and invalidated.
> 
> > That in turn means it's possible for a program to accidentally leak
> > mounts and it's also possible to make a task leak mounts without it's
> > knowledge if the attacker just keeps overmounting things under
> > /proc/<pid>/fd/<nr>.
> 
> > I think it's wrong to try and fix this by us starting to play games with
> > close() or somewhere else to undo these mounts when the file descriptor
> > is closed. The fact that we allow overmounting of such magic links is
> > simply a bug and one that we need to fix.
> 
> > Similar things can be said about entries under fdinfo/ and map_files/ so
> > those are restricted as well.
> 
> > I have a further more aggressive patch that gets out the big hammer and
> > makes everything under /proc/<pid>/*, as well as immediate symlinks such
> > as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
> > into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
> > if we can get away with it. It's only useful to hide exploits such as in [1].
> 
> > And again, overmounting of any global procfs files remains unaffected
> > and is an existing and supported use-case.
> 
> > Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]
> 
> this is fixing a security issue, right? Wouldn't it be worth to backport these
> commits to active stable/LTS kernels. I guess it was considered as a new feature
> that's why it was not backported (looking at 6.11, 6.6 and 6.1).

It's a security hardening against some attack methods, but it's not
necessarily fixing a specific security issue. It is possible to operate
on such paths safely in most situations[1] so it's not a critical issue
(though I am happy to see the fix, as it makes openat2 more trivially
useful for this case and protects unprivileged programs).

I suspect Christian didn't mark it for stable because there was a risk
of breaking programs (fwiw, it did "break" some tests I had for the
library I linked in [1] since we had test scenarios to make sure we
handled this attack pattern safely, but that was expected -- I don't
think any actual user was affected by this change). And yes, it's
arguably a new feature.

[1]: https://docs.rs/pathrs/latest/pathrs/procfs/struct.ProcfsHandle.html#method.open_follow

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-30 15:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-06 16:02 [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Christian Brauner
2024-08-06 16:02 ` [PATCH RFC 1/6] proc: proc_readfd() -> proc_fd_iterate_shared() Christian Brauner
2024-08-06 17:06   ` Linus Torvalds
2024-08-06 16:02 ` [PATCH RFC 2/6] proc: proc_readfdinfo() -> proc_fdinfo_iterate_shared() Christian Brauner
2024-08-06 16:02 ` [PATCH RFC 3/6] proc: add proc_splice_unmountable() Christian Brauner
2024-08-06 16:02 ` [PATCH RFC 4/6] proc: block mounting on top of /proc/<pid>/map_files/* Christian Brauner
2024-08-06 16:02 ` [PATCH RFC 5/6] proc: block mounting on top of /proc/<pid>/fd/* Christian Brauner
2024-08-06 16:02 ` [PATCH RFC 6/6] proc: block mounting on top of /proc/<pid>/fdinfo/* Christian Brauner
2024-08-07 14:33 ` [PATCH RFC 0/6] proc: restrict overmounting of ephemeral entities Josef Bacik
2025-09-30  8:53 ` Petr Vorel
2025-09-30 15:51 ` Aleksa Sarai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).