public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Gladkov <legion@kernel.org>,
	 Dan Klishch <danilklishch@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	 "Eric W . Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	containers@lists.linux.dev,  linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used
Date: Thu, 16 Apr 2026 15:30:42 +0200	[thread overview]
Message-ID: <20260416-gewusel-dreck-63a7fc32a586@brauner> (raw)
In-Reply-To: <2026-04-16-popular-long-bicep-bistros-8teEl3@cyphar.com>

[-- Attachment #1: Type: text/plain, Size: 2439 bytes --]

On Thu, Apr 16, 2026 at 10:46:50PM +1000, Aleksa Sarai wrote:
> On 2026-04-16, Aleksa Sarai <cyphar@cyphar.com> wrote:
> > On 2026-04-13, Alexey Gladkov <legion@kernel.org> wrote:
> > > When procfs is mounted with the subset=pid option, all system files and
> > > directories from the root of the filesystem are not accessible in
> > > userspace. Only dynamic information about processes is available, which
> > > cannot be hidden with overmount.
> > > 
> > > For this reason, checking for full visibility is not relevant if mounting
> > > is performed with the subset=pid option.
> > > 
> > > Signed-off-by: Alexey Gladkov <legion@kernel.org>
> > > ---
> > 
> > > -static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
> > > +static bool mount_too_revealing(struct fs_context *fc, int *new_mnt_flags)
> > >  {
> > >  	const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
> > >  	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
> > > +	const struct super_block *sb = fc->root->d_sb;
> > >  	unsigned long s_iflags;
> > >  
> > >  	if (ns->user_ns == &init_user_ns)
> > > @@ -6388,7 +6387,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
> > >  		return true;
> > >  	}
> > >  
> > > -	return !mnt_already_visible(ns, sb, new_mnt_flags);
> > > +	return (!fc->skip_visibility && !mnt_already_visible(ns, sb, new_mnt_flags));
> > >  }
> > 
> > Unless I'm missing something (I haven't tested this locally yet, sorry),
> > this will allow you to bypass mount_too_revealing() even for
> > non-subset=pid mounts because once you create a subset=pid mount then a
> > regular procfs mount will see the subset=pid mount and permit it.
> > 
> > I think the solution is quite simple -- you can also skip super-blocks
> > that have fc->skip_visibility set in mnt_already_visible().
> 
> I now see that check was present in v8 but I guess its importance wasn't
> obvious. I guess this means we will need to reintroduce
> SB_I_USERNS_ALLOW_REVEALING. :/

I've been playing with something else. So first we should move
SB_I_USERNS_REVEALING to an fs_type flag. It's not an optional thing and
always set and never removed. That also means we can simplify
sysfs_get_tree() to just kernfs_get_tree().

And then we raise SB_I_USERNS_RESTRICTED on all procfs mounts with
pid_only and disallow using them for calculating mount permissions for
unrestricted procfs mounts.

Aleksa?

[-- Attachment #2: 0001-FOLD-Don-t-allow-to-abuse-restricted-procfs-mounts-t.patch --]
[-- Type: text/x-diff, Size: 2549 bytes --]

From 1041e9c62d74ffd03c7bfc748cc952714ab51f01 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:22:15 +0200
Subject: [PATCH] [FOLD]: Don't allow to abuse restricted procfs mounts to
 mount unrestricted ones

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/namespace.c                 | 11 ++++++++++-
 fs/proc/root.c                 |  5 ++++-
 include/linux/fs/super_types.h |  1 +
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7b171a67dd50..099ce44e46b5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6351,10 +6351,19 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 
 	guard(namespace_shared)();
 	hlist_for_each_entry(mnt, &ns->mnt_visible_mounts, mnt_ns_visible) {
+		const struct super_block *sb_visible = mnt->mnt.mnt_sb;
 		struct mount *child;
 		int mnt_flags;
 
-		if (mnt->mnt.mnt_sb->s_type != sb->s_type)
+		if (sb_visible->s_type != sb->s_type)
+			continue;
+
+		/*
+		 * If the new superblock is not going to be restricted then any
+		 * mount that is restricted cannot be used to allow it.
+		 */
+		if (!(sb->s_iflags & SB_I_USERNS_RESTRICTED) &&
+		    (sb_visible->s_iflags & SB_I_USERNS_RESTRICTED))
 			continue;
 
 		/* A local view of the mount flags */
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 350fc4f888ab..a035467b465b 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -267,8 +267,11 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
 	 * The dynamic part of procfs cannot be hidden using overmount.
 	 * Therefore, the check for "not fully visible" can be skipped.
 	 */
-	if (fs_info->pidonly)
+	if (fs_info->pidonly) {
 		fc->skip_visibility = true;
+		/* Indicate that this procfs instance hides a bunch of files. */
+		s->s_iflags |= SB_I_USERNS_RESTRICTED;
+	}
 
 	/* User space would break if executables or devices appear on proc */
 	s->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
diff --git a/include/linux/fs/super_types.h b/include/linux/fs/super_types.h
index 182efbeb9520..4ae7d0c3d55b 100644
--- a/include/linux/fs/super_types.h
+++ b/include/linux/fs/super_types.h
@@ -326,6 +326,7 @@ struct super_block {
 #define SB_I_STABLE_WRITES 0x00000008	/* don't modify blks until WB is done */
 
 /* sb->s_iflags to limit user namespace mounts */
+#define SB_I_USERNS_RESTRICTED		0x00000010
 #define SB_I_IMA_UNVERIFIABLE_SIGNATURE	0x00000020
 #define SB_I_UNTRUSTED_MOUNTER		0x00000040
 #define SB_I_EVM_HMAC_UNSUPPORTED	0x00000080
-- 
2.47.3


[-- Attachment #3: 0001-fs-move-SB_I_USERNS_VISIBLE-to-FS_USERNS_MOUNT_RESTR.patch --]
[-- Type: text/x-diff, Size: 5152 bytes --]

From bcb9e887619cbf5f2cd0a6a80b299e3a5fc2692a Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:04:31 +0200
Subject: [PATCH 1/2] fs: move SB_I_USERNS_VISIBLE to
 FS_USERNS_MOUNT_RESTRICTED

Whether a filesystem's mounts need to undergo a visibility check in user
namespaces is a static property of the filesystem type, not a runtime
property of each superblock instance. Both proc and sysfs always set
SB_I_USERNS_VISIBLE on their superblocks unconditionally (sysfs does so
on first creation, and subsequent mounts reuse the same superblock).

Move this flag from sb->s_iflags (SB_I_USERNS_VISIBLE) to
file_system_type->fs_flags (FS_USERNS_MOUNT_RESTRICTED) so the intent
is expressed at the filesystem type level where it belongs.

All check sites are updated to test sb->s_type->fs_flags instead of
sb->s_iflags. The SB_I_NOEXEC and SB_I_NODEV flags remain on the
superblock as they are runtime properties set during fill_super.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/namespace.c                 | 4 ++--
 fs/proc/root.c                 | 4 ++--
 fs/sysfs/mount.c               | 4 +---
 include/linux/fs.h             | 1 +
 include/linux/fs/super_types.h | 1 -
 kernel/acct.c                  | 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..a60ddfe71c7a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6405,10 +6405,10 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 		return false;
 
 	/* Can this filesystem be too revealing? */
-	s_iflags = sb->s_iflags;
-	if (!(s_iflags & SB_I_USERNS_VISIBLE))
+	if (!(sb->s_type->fs_flags & FS_USERNS_MOUNT_RESTRICTED))
 		return false;
 
+	s_iflags = sb->s_iflags;
 	if ((s_iflags & required_iflags) != required_iflags) {
 		WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
 			  required_iflags);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0f9100559471..b65053f9f046 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -257,7 +257,7 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
 	proc_apply_options(fs_info, fc, current_user_ns());
 
 	/* User space would break if executables or devices appear on proc */
-	s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
+	s->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
 	s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
@@ -359,7 +359,7 @@ static struct file_system_type proc_fs_type = {
 	.init_fs_context	= proc_init_fs_context,
 	.parameters		= proc_fs_parameters,
 	.kill_sb		= proc_kill_sb,
-	.fs_flags		= FS_USERNS_MOUNT | FS_DISALLOW_NOTIFY_PERM,
+	.fs_flags		= FS_USERNS_MOUNT | FS_USERNS_MOUNT_RESTRICTED | FS_DISALLOW_NOTIFY_PERM,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b199e8ff79b1..b45ea5d511e7 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -32,8 +32,6 @@ static int sysfs_get_tree(struct fs_context *fc)
 	if (ret)
 		return ret;
 
-	if (kfc->new_sb_created)
-		fc->root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
 	return 0;
 }
 
@@ -93,7 +91,7 @@ static struct file_system_type sysfs_fs_type = {
 	.name			= "sysfs",
 	.init_fs_context	= sysfs_init_fs_context,
 	.kill_sb		= sysfs_kill_sb,
-	.fs_flags		= FS_USERNS_MOUNT,
+	.fs_flags		= FS_USERNS_MOUNT | FS_USERNS_MOUNT_RESTRICTED,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b01bb22d12..17a6baefb7d3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2280,6 +2280,7 @@ struct file_system_type {
 #define FS_MGTIME		64	/* FS uses multigrain timestamps */
 #define FS_LBS			128	/* FS supports LBS */
 #define FS_POWER_FREEZE		256	/* Always freeze on suspend/hibernate */
+#define FS_USERNS_MOUNT_RESTRICTED 512	/* Restrict mount in userns if not already visible */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
diff --git a/include/linux/fs/super_types.h b/include/linux/fs/super_types.h
index 383050e7fdf5..182efbeb9520 100644
--- a/include/linux/fs/super_types.h
+++ b/include/linux/fs/super_types.h
@@ -326,7 +326,6 @@ struct super_block {
 #define SB_I_STABLE_WRITES 0x00000008	/* don't modify blks until WB is done */
 
 /* sb->s_iflags to limit user namespace mounts */
-#define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
 #define SB_I_IMA_UNVERIFIABLE_SIGNATURE	0x00000020
 #define SB_I_UNTRUSTED_MOUNTER		0x00000040
 #define SB_I_EVM_HMAC_UNSUPPORTED	0x00000080
diff --git a/kernel/acct.c b/kernel/acct.c
index cbbf79d718cf..c440d43479ca 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -249,7 +249,7 @@ static int acct_on(const char __user *name)
 		return -EINVAL;
 
 	/* Exclude procfs and sysfs. */
-	if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE)
+	if (file_inode(file)->i_sb->s_type->fs_flags & FS_USERNS_MOUNT_RESTRICTED)
 		return -EINVAL;
 
 	if (!(file->f_mode & FMODE_CAN_WRITE))
-- 
2.47.3


[-- Attachment #4: 0002-sysfs-remove-trivial-sysfs_get_tree-wrapper.patch --]
[-- Type: text/x-diff, Size: 1461 bytes --]

From b43adcafc42753065739a22a053417d43694d9ac Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:04:47 +0200
Subject: [PATCH 2/2] sysfs: remove trivial sysfs_get_tree() wrapper

Now that FS_USERNS_MOUNT_RESTRICTED is a file_system_type flag,
sysfs_get_tree() is a trivial wrapper around kernfs_get_tree() with no
additional logic. Point sysfs_fs_context_ops.get_tree directly at
kernfs_get_tree() and remove the wrapper.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/sysfs/mount.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b45ea5d511e7..88c10823fcaf 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -23,18 +23,6 @@
 static struct kernfs_root *sysfs_root;
 struct kernfs_node *sysfs_root_kn;
 
-static int sysfs_get_tree(struct fs_context *fc)
-{
-	struct kernfs_fs_context *kfc = fc->fs_private;
-	int ret;
-
-	ret = kernfs_get_tree(fc);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
 static void sysfs_fs_context_free(struct fs_context *fc)
 {
 	struct kernfs_fs_context *kfc = fc->fs_private;
@@ -47,7 +35,7 @@ static void sysfs_fs_context_free(struct fs_context *fc)
 
 static const struct fs_context_operations sysfs_fs_context_ops = {
 	.free		= sysfs_fs_context_free,
-	.get_tree	= sysfs_get_tree,
+	.get_tree	= kernfs_get_tree,
 };
 
 static int sysfs_init_fs_context(struct fs_context *fc)
-- 
2.47.3


  reply	other threads:[~2026-04-16 13:30 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-16 10:45 [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2021-07-16 10:45 ` [RESEND PATCH v6 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 4/5] proc: Relax check of mount visibility Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2025-12-13  5:06 ` [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility Dan Klishch
2025-12-13 10:49   ` Alexey Gladkov
2025-12-13 18:00     ` Dan Klishch
2025-12-14 16:40       ` Alexey Gladkov
2025-12-14 18:02         ` Dan Klishch
2025-12-15 10:10           ` Alexey Gladkov
2025-12-15 14:46             ` Dan Klishch
2025-12-15 14:58               ` Alexey Gladkov
2025-12-24 12:55                 ` Christian Brauner
2026-01-30 13:34                   ` Alexey Gladkov
2025-12-15 11:30           ` Christian Brauner
2026-01-13  9:20   ` [PATCH v7 " Alexey Gladkov
2026-01-13  9:20     ` [PATCH v7 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-01-13  9:20     ` [PATCH v7 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-02-04 14:39       ` Christian Brauner
2026-02-11 19:35         ` Alexey Gladkov
2026-01-13  9:20     ` [PATCH v7 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-01-13  9:20     ` [PATCH v7 4/5] proc: Relax check of mount visibility Alexey Gladkov
2026-01-13  9:20     ` [PATCH v7 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2026-02-13 10:44     ` [PATCH v8 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-02-13 10:44       ` [PATCH v8 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-02-13 10:44       ` [PATCH v8 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-02-13 10:44       ` [PATCH v8 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-02-13 10:44       ` [PATCH v8 4/5] proc: Relax check of mount visibility Alexey Gladkov
2026-02-17 11:59         ` Christian Brauner
2026-04-10 11:12           ` Christian Brauner
2026-04-10 11:31             ` Alexey Gladkov
2026-04-14  9:55               ` Christian Brauner
2026-02-13 10:44       ` [PATCH v8 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2026-04-13 11:19       ` [PATCH v9 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-04-13 11:19         ` [PATCH v9 1/5] namespace: record fully visible mounts in list Alexey Gladkov
2026-04-13 11:19         ` [PATCH v9 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-04-13 11:19         ` [PATCH v9 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-04-13 11:19         ` [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used Alexey Gladkov
2026-04-16 12:30           ` Aleksa Sarai
2026-04-16 12:46             ` Aleksa Sarai
2026-04-16 13:30               ` Christian Brauner [this message]
2026-04-16 15:03                 ` Aleksa Sarai
2026-04-16 12:52           ` Christian Brauner
2026-04-13 11:19         ` [PATCH v9 5/5] docs: proc: add documentation about mount restrictions Alexey Gladkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260416-gewusel-dreck-63a7fc32a586@brauner \
    --to=brauner@kernel.org \
    --cc=containers@lists.linux.dev \
    --cc=cyphar@cyphar.com \
    --cc=danilklishch@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=legion@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox