From: Christian Brauner <brauner@kernel.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Gladkov <legion@kernel.org>,
Dan Klishch <danilklishch@gmail.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Kees Cook <keescook@chromium.org>,
containers@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used
Date: Thu, 16 Apr 2026 15:30:42 +0200 [thread overview]
Message-ID: <20260416-gewusel-dreck-63a7fc32a586@brauner> (raw)
In-Reply-To: <2026-04-16-popular-long-bicep-bistros-8teEl3@cyphar.com>
[-- Attachment #1: Type: text/plain, Size: 2439 bytes --]
On Thu, Apr 16, 2026 at 10:46:50PM +1000, Aleksa Sarai wrote:
> On 2026-04-16, Aleksa Sarai <cyphar@cyphar.com> wrote:
> > On 2026-04-13, Alexey Gladkov <legion@kernel.org> wrote:
> > > When procfs is mounted with the subset=pid option, all system files and
> > > directories from the root of the filesystem are not accessible in
> > > userspace. Only dynamic information about processes is available, which
> > > cannot be hidden with overmount.
> > >
> > > For this reason, checking for full visibility is not relevant if mounting
> > > is performed with the subset=pid option.
> > >
> > > Signed-off-by: Alexey Gladkov <legion@kernel.org>
> > > ---
> >
> > > -static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
> > > +static bool mount_too_revealing(struct fs_context *fc, int *new_mnt_flags)
> > > {
> > > const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
> > > struct mnt_namespace *ns = current->nsproxy->mnt_ns;
> > > + const struct super_block *sb = fc->root->d_sb;
> > > unsigned long s_iflags;
> > >
> > > if (ns->user_ns == &init_user_ns)
> > > @@ -6388,7 +6387,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
> > > return true;
> > > }
> > >
> > > - return !mnt_already_visible(ns, sb, new_mnt_flags);
> > > + return (!fc->skip_visibility && !mnt_already_visible(ns, sb, new_mnt_flags));
> > > }
> >
> > Unless I'm missing something (I haven't tested this locally yet, sorry),
> > this will allow you to bypass mount_too_revealing() even for
> > non-subset=pid mounts because once you create a subset=pid mount then a
> > regular procfs mount will see the subset=pid mount and permit it.
> >
> > I think the solution is quite simple -- you can also skip super-blocks
> > that have fc->skip_visibility set in mnt_already_visible().
>
> I now see that check was present in v8 but I guess its importance wasn't
> obvious. I guess this means we will need to reintroduce
> SB_I_USERNS_ALLOW_REVEALING. :/
I've been playing with something else. So first we should move
SB_I_USERNS_REVEALING to an fs_type flag. It's not an optional thing and
always set and never removed. That also means we can simplify
sysfs_get_tree() to just kernfs_get_tree().
And then we raise SB_I_USERNS_RESTRICTED on all procfs mounts with
pid_only and disallow using them for calculating mount permissions for
unrestricted procfs mounts.
Aleksa?
[-- Attachment #2: 0001-FOLD-Don-t-allow-to-abuse-restricted-procfs-mounts-t.patch --]
[-- Type: text/x-diff, Size: 2549 bytes --]
From 1041e9c62d74ffd03c7bfc748cc952714ab51f01 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:22:15 +0200
Subject: [PATCH] [FOLD]: Don't allow to abuse restricted procfs mounts to
mount unrestricted ones
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/namespace.c | 11 ++++++++++-
fs/proc/root.c | 5 ++++-
include/linux/fs/super_types.h | 1 +
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 7b171a67dd50..099ce44e46b5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6351,10 +6351,19 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
guard(namespace_shared)();
hlist_for_each_entry(mnt, &ns->mnt_visible_mounts, mnt_ns_visible) {
+ const struct super_block *sb_visible = mnt->mnt.mnt_sb;
struct mount *child;
int mnt_flags;
- if (mnt->mnt.mnt_sb->s_type != sb->s_type)
+ if (sb_visible->s_type != sb->s_type)
+ continue;
+
+ /*
+ * If the new superblock is not going to be restricted then any
+ * mount that is restricted cannot be used to allow it.
+ */
+ if (!(sb->s_iflags & SB_I_USERNS_RESTRICTED) &&
+ (sb_visible->s_iflags & SB_I_USERNS_RESTRICTED))
continue;
/* A local view of the mount flags */
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 350fc4f888ab..a035467b465b 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -267,8 +267,11 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
* The dynamic part of procfs cannot be hidden using overmount.
* Therefore, the check for "not fully visible" can be skipped.
*/
- if (fs_info->pidonly)
+ if (fs_info->pidonly) {
fc->skip_visibility = true;
+ /* Indicate that this procfs instance hides a bunch of files. */
+ s->s_iflags |= SB_I_USERNS_RESTRICTED;
+ }
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
diff --git a/include/linux/fs/super_types.h b/include/linux/fs/super_types.h
index 182efbeb9520..4ae7d0c3d55b 100644
--- a/include/linux/fs/super_types.h
+++ b/include/linux/fs/super_types.h
@@ -326,6 +326,7 @@ struct super_block {
#define SB_I_STABLE_WRITES 0x00000008 /* don't modify blks until WB is done */
/* sb->s_iflags to limit user namespace mounts */
+#define SB_I_USERNS_RESTRICTED 0x00000010
#define SB_I_IMA_UNVERIFIABLE_SIGNATURE 0x00000020
#define SB_I_UNTRUSTED_MOUNTER 0x00000040
#define SB_I_EVM_HMAC_UNSUPPORTED 0x00000080
--
2.47.3
[-- Attachment #3: 0001-fs-move-SB_I_USERNS_VISIBLE-to-FS_USERNS_MOUNT_RESTR.patch --]
[-- Type: text/x-diff, Size: 5152 bytes --]
From bcb9e887619cbf5f2cd0a6a80b299e3a5fc2692a Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:04:31 +0200
Subject: [PATCH 1/2] fs: move SB_I_USERNS_VISIBLE to
FS_USERNS_MOUNT_RESTRICTED
Whether a filesystem's mounts need to undergo a visibility check in user
namespaces is a static property of the filesystem type, not a runtime
property of each superblock instance. Both proc and sysfs always set
SB_I_USERNS_VISIBLE on their superblocks unconditionally (sysfs does so
on first creation, and subsequent mounts reuse the same superblock).
Move this flag from sb->s_iflags (SB_I_USERNS_VISIBLE) to
file_system_type->fs_flags (FS_USERNS_MOUNT_RESTRICTED) so the intent
is expressed at the filesystem type level where it belongs.
All check sites are updated to test sb->s_type->fs_flags instead of
sb->s_iflags. The SB_I_NOEXEC and SB_I_NODEV flags remain on the
superblock as they are runtime properties set during fill_super.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/namespace.c | 4 ++--
fs/proc/root.c | 4 ++--
fs/sysfs/mount.c | 4 +---
include/linux/fs.h | 1 +
include/linux/fs/super_types.h | 1 -
kernel/acct.c | 2 +-
6 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index fe919abd2f01..a60ddfe71c7a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6405,10 +6405,10 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
return false;
/* Can this filesystem be too revealing? */
- s_iflags = sb->s_iflags;
- if (!(s_iflags & SB_I_USERNS_VISIBLE))
+ if (!(sb->s_type->fs_flags & FS_USERNS_MOUNT_RESTRICTED))
return false;
+ s_iflags = sb->s_iflags;
if ((s_iflags & required_iflags) != required_iflags) {
WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
required_iflags);
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 0f9100559471..b65053f9f046 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -257,7 +257,7 @@ static int proc_fill_super(struct super_block *s, struct fs_context *fc)
proc_apply_options(fs_info, fc, current_user_ns());
/* User space would break if executables or devices appear on proc */
- s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
+ s->s_iflags |= SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
s->s_blocksize = 1024;
s->s_blocksize_bits = 10;
@@ -359,7 +359,7 @@ static struct file_system_type proc_fs_type = {
.init_fs_context = proc_init_fs_context,
.parameters = proc_fs_parameters,
.kill_sb = proc_kill_sb,
- .fs_flags = FS_USERNS_MOUNT | FS_DISALLOW_NOTIFY_PERM,
+ .fs_flags = FS_USERNS_MOUNT | FS_USERNS_MOUNT_RESTRICTED | FS_DISALLOW_NOTIFY_PERM,
};
void __init proc_root_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b199e8ff79b1..b45ea5d511e7 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -32,8 +32,6 @@ static int sysfs_get_tree(struct fs_context *fc)
if (ret)
return ret;
- if (kfc->new_sb_created)
- fc->root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE;
return 0;
}
@@ -93,7 +91,7 @@ static struct file_system_type sysfs_fs_type = {
.name = "sysfs",
.init_fs_context = sysfs_init_fs_context,
.kill_sb = sysfs_kill_sb,
- .fs_flags = FS_USERNS_MOUNT,
+ .fs_flags = FS_USERNS_MOUNT | FS_USERNS_MOUNT_RESTRICTED,
};
int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b01bb22d12..17a6baefb7d3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2280,6 +2280,7 @@ struct file_system_type {
#define FS_MGTIME 64 /* FS uses multigrain timestamps */
#define FS_LBS 128 /* FS supports LBS */
#define FS_POWER_FREEZE 256 /* Always freeze on suspend/hibernate */
+#define FS_USERNS_MOUNT_RESTRICTED 512 /* Restrict mount in userns if not already visible */
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
int (*init_fs_context)(struct fs_context *);
const struct fs_parameter_spec *parameters;
diff --git a/include/linux/fs/super_types.h b/include/linux/fs/super_types.h
index 383050e7fdf5..182efbeb9520 100644
--- a/include/linux/fs/super_types.h
+++ b/include/linux/fs/super_types.h
@@ -326,7 +326,6 @@ struct super_block {
#define SB_I_STABLE_WRITES 0x00000008 /* don't modify blks until WB is done */
/* sb->s_iflags to limit user namespace mounts */
-#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */
#define SB_I_IMA_UNVERIFIABLE_SIGNATURE 0x00000020
#define SB_I_UNTRUSTED_MOUNTER 0x00000040
#define SB_I_EVM_HMAC_UNSUPPORTED 0x00000080
diff --git a/kernel/acct.c b/kernel/acct.c
index cbbf79d718cf..c440d43479ca 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -249,7 +249,7 @@ static int acct_on(const char __user *name)
return -EINVAL;
/* Exclude procfs and sysfs. */
- if (file_inode(file)->i_sb->s_iflags & SB_I_USERNS_VISIBLE)
+ if (file_inode(file)->i_sb->s_type->fs_flags & FS_USERNS_MOUNT_RESTRICTED)
return -EINVAL;
if (!(file->f_mode & FMODE_CAN_WRITE))
--
2.47.3
[-- Attachment #4: 0002-sysfs-remove-trivial-sysfs_get_tree-wrapper.patch --]
[-- Type: text/x-diff, Size: 1461 bytes --]
From b43adcafc42753065739a22a053417d43694d9ac Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 16 Apr 2026 15:04:47 +0200
Subject: [PATCH 2/2] sysfs: remove trivial sysfs_get_tree() wrapper
Now that FS_USERNS_MOUNT_RESTRICTED is a file_system_type flag,
sysfs_get_tree() is a trivial wrapper around kernfs_get_tree() with no
additional logic. Point sysfs_fs_context_ops.get_tree directly at
kernfs_get_tree() and remove the wrapper.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
fs/sysfs/mount.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index b45ea5d511e7..88c10823fcaf 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -23,18 +23,6 @@
static struct kernfs_root *sysfs_root;
struct kernfs_node *sysfs_root_kn;
-static int sysfs_get_tree(struct fs_context *fc)
-{
- struct kernfs_fs_context *kfc = fc->fs_private;
- int ret;
-
- ret = kernfs_get_tree(fc);
- if (ret)
- return ret;
-
- return 0;
-}
-
static void sysfs_fs_context_free(struct fs_context *fc)
{
struct kernfs_fs_context *kfc = fc->fs_private;
@@ -47,7 +35,7 @@ static void sysfs_fs_context_free(struct fs_context *fc)
static const struct fs_context_operations sysfs_fs_context_ops = {
.free = sysfs_fs_context_free,
- .get_tree = sysfs_get_tree,
+ .get_tree = kernfs_get_tree,
};
static int sysfs_init_fs_context(struct fs_context *fc)
--
2.47.3
next prev parent reply other threads:[~2026-04-16 13:30 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-16 10:45 [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2021-07-16 10:45 ` [RESEND PATCH v6 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 4/5] proc: Relax check of mount visibility Alexey Gladkov
2021-07-16 10:46 ` [RESEND PATCH v6 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2025-12-13 5:06 ` [RESEND PATCH v6 0/5] proc: subset=pid: Relax check of mount visibility Dan Klishch
2025-12-13 10:49 ` Alexey Gladkov
2025-12-13 18:00 ` Dan Klishch
2025-12-14 16:40 ` Alexey Gladkov
2025-12-14 18:02 ` Dan Klishch
2025-12-15 10:10 ` Alexey Gladkov
2025-12-15 14:46 ` Dan Klishch
2025-12-15 14:58 ` Alexey Gladkov
2025-12-24 12:55 ` Christian Brauner
2026-01-30 13:34 ` Alexey Gladkov
2025-12-15 11:30 ` Christian Brauner
2026-01-13 9:20 ` [PATCH v7 " Alexey Gladkov
2026-01-13 9:20 ` [PATCH v7 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-01-13 9:20 ` [PATCH v7 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-02-04 14:39 ` Christian Brauner
2026-02-11 19:35 ` Alexey Gladkov
2026-01-13 9:20 ` [PATCH v7 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-01-13 9:20 ` [PATCH v7 4/5] proc: Relax check of mount visibility Alexey Gladkov
2026-01-13 9:20 ` [PATCH v7 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2026-02-13 10:44 ` [PATCH v8 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-02-13 10:44 ` [PATCH v8 1/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-02-13 10:44 ` [PATCH v8 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-02-13 10:44 ` [PATCH v8 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-02-13 10:44 ` [PATCH v8 4/5] proc: Relax check of mount visibility Alexey Gladkov
2026-02-17 11:59 ` Christian Brauner
2026-04-10 11:12 ` Christian Brauner
2026-04-10 11:31 ` Alexey Gladkov
2026-04-14 9:55 ` Christian Brauner
2026-02-13 10:44 ` [PATCH v8 5/5] docs: proc: add documentation about relaxing visibility restrictions Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 1/5] namespace: record fully visible mounts in list Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used Alexey Gladkov
2026-04-16 12:30 ` Aleksa Sarai
2026-04-16 12:46 ` Aleksa Sarai
2026-04-16 13:30 ` Christian Brauner [this message]
2026-04-16 15:03 ` Aleksa Sarai
2026-04-16 12:52 ` Christian Brauner
2026-04-13 11:19 ` [PATCH v9 5/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260416-gewusel-dreck-63a7fc32a586@brauner \
--to=brauner@kernel.org \
--cc=containers@lists.linux.dev \
--cc=cyphar@cyphar.com \
--cc=danilklishch@gmail.com \
--cc=ebiederm@xmission.com \
--cc=keescook@chromium.org \
--cc=legion@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox