From: Christian Brauner <brauner@kernel.org>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexey Gladkov <legion@kernel.org>,
Dan Klishch <danilklishch@gmail.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Kees Cook <keescook@chromium.org>,
containers@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used
Date: Tue, 21 Apr 2026 13:51:47 +0200 [thread overview]
Message-ID: <20260421-duzen-laugenbrezel-e1d1138ff5ae@brauner> (raw)
In-Reply-To: <2026-04-16-jaded-sour-find-end-6LhEP4@cyphar.com>
On Fri, Apr 17, 2026 at 01:03:46AM +1000, Aleksa Sarai wrote:
> On 2026-04-16, Christian Brauner <brauner@kernel.org> wrote:
> > On Thu, Apr 16, 2026 at 10:46:50PM +1000, Aleksa Sarai wrote:
> > > On 2026-04-16, Aleksa Sarai <cyphar@cyphar.com> wrote:
> > > > On 2026-04-13, Alexey Gladkov <legion@kernel.org> wrote:
> > > > > When procfs is mounted with the subset=pid option, all system files and
> > > > > directories from the root of the filesystem are not accessible in
> > > > > userspace. Only dynamic information about processes is available, which
> > > > > cannot be hidden with overmount.
> > > > >
> > > > > For this reason, checking for full visibility is not relevant if mounting
> > > > > is performed with the subset=pid option.
> > > > >
> > > > > Signed-off-by: Alexey Gladkov <legion@kernel.org>
> > > > > ---
> > > >
> > > > > -static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
> > > > > +static bool mount_too_revealing(struct fs_context *fc, int *new_mnt_flags)
> > > > > {
> > > > > const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
> > > > > struct mnt_namespace *ns = current->nsproxy->mnt_ns;
> > > > > + const struct super_block *sb = fc->root->d_sb;
> > > > > unsigned long s_iflags;
> > > > >
> > > > > if (ns->user_ns == &init_user_ns)
> > > > > @@ -6388,7 +6387,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
> > > > > return true;
> > > > > }
> > > > >
> > > > > - return !mnt_already_visible(ns, sb, new_mnt_flags);
> > > > > + return (!fc->skip_visibility && !mnt_already_visible(ns, sb, new_mnt_flags));
> > > > > }
> > > >
> > > > Unless I'm missing something (I haven't tested this locally yet, sorry),
> > > > this will allow you to bypass mount_too_revealing() even for
> > > > non-subset=pid mounts because once you create a subset=pid mount then a
> > > > regular procfs mount will see the subset=pid mount and permit it.
> > > >
> > > > I think the solution is quite simple -- you can also skip super-blocks
> > > > that have fc->skip_visibility set in mnt_already_visible().
> > >
> > > I now see that check was present in v8 but I guess its importance wasn't
> > > obvious. I guess this means we will need to reintroduce
> > > SB_I_USERNS_ALLOW_REVEALING. :/
> >
> > I've been playing with something else. So first we should move
> > SB_I_USERNS_REVEALING to an fs_type flag. It's not an optional thing and
> > always set and never removed. That also means we can simplify
> > sysfs_get_tree() to just kernfs_get_tree().
>
> Seems quite reasonable to me.
>
> > And then we raise SB_I_USERNS_RESTRICTED on all procfs mounts with
> > pid_only and disallow using them for calculating mount permissions for
> > unrestricted procfs mounts.
>
> I think that's fine here and is probably all we need for now (though I
> think that the name is a little confusing especially given my next
> comment), but I think there is a bit of a deeper problem here that
> deserves to be mentioned if only for posterity.
>
> The core issue really is that we actually have two versions of procfs
> now, and they are effectively distinguished by fs_context flags instead
> of fs_type. Back when subset=pid was merged there was a discussion about
> a hypothetical proc2 -- while that got dropped, we are kind of
> reinventing this split in an ad-hoc way.
>
> Conceptually considering them as two distinct fs_types would be the more
> semantically correct thing to do, though I think it would be overkill to
> come up with some framework for that. (I also really hope we don't have
> any other filesystems where this is also the case.)
Doing this is almost trivial. The problem really is that we'd expose
"procfs2" or whatever as a new mount option to userspace without
actually having fundamentally revamped what we would like procfs2 to do.
IOW, it would elevate a security hack that got added a while ago to a
new filesystem type. I think that would just be sad. So I'm not so
excited about doing this.
> This is why I think the name is a little weird, since it isn't an issue
> about one procfs being restricted in a userns, it's that they are
> conceptually unrelated filesystem types. A very lightweight version of
I somewhat disagree because it's unclear where you draw the line. If you
use the hidepid= stuff at its most restrictive where it only shows tasks
your user owns you could also argue that it's very close to a separate
filesystem type. It's strictly subtractive and so is pidonly. There's
just nothing that's better or novel.
> this would be making it something like SB_I_RESTRICTED_VARIANT that
> would be more a little more strict than what you have now --
> SB_I_RESTRICTED_VARIANT would not be treated as compatible with anything
> (even another SB_I_RESTRICTED_VARIANT mount) so that if we add more
> variants in the future we don't treat them the same either. (Again, I
> don't think we need this now?) Of course this would still allow
> subset=pid without a procfs mount because of the other special casing
> for it in mount_too_revealing().
It's not a problem today. I think we can simply treat
SB_I_RESTRICTED_VARIANT as not allowing other restricted variants. And
we'd need a more elaborate mechanism anyway once we'd have "compatible"
subsets.
> We are also limited in what semantics we can change here too (and I'm a
> little worried that even this bit for SB_I_USERNS_RESTRICTED is at risk
> of breaking something) so maybe that isn't needed today.
So I think yes, SB_I_RESTRICTED_VARIANT is fine by me for now.
next prev parent reply other threads:[~2026-04-21 11:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1770979341.git.legion@kernel.org>
2026-04-13 11:19 ` [PATCH v9 0/5] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 1/5] namespace: record fully visible mounts in list Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 2/5] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 3/5] proc: Disable cancellation of subset=pid option Alexey Gladkov
2026-04-13 11:19 ` [PATCH v9 4/5] proc: Skip the visibility check if subset=pid is used Alexey Gladkov
2026-04-16 12:30 ` Aleksa Sarai
2026-04-16 12:46 ` Aleksa Sarai
2026-04-16 13:30 ` Christian Brauner
2026-04-16 15:03 ` Aleksa Sarai
2026-04-21 11:51 ` Christian Brauner [this message]
2026-04-21 12:24 ` Alexey Gladkov
2026-04-22 12:46 ` Christian Brauner
2026-04-22 22:32 ` Aleksa Sarai
2026-04-16 12:52 ` Christian Brauner
2026-04-13 11:19 ` [PATCH v9 5/5] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 0/7] proc: subset=pid: Relax check of mount visibility Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 1/7] namespace: record fully visible mounts in list Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 2/7] fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 3/7] sysfs: remove trivial sysfs_get_tree() wrapper Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 4/7] proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 5/7] proc: prevent reconfiguring subset=pid Alexey Gladkov
2026-04-27 22:31 ` Aleksa Sarai
2026-04-27 8:26 ` [PATCH v10 6/7] proc: handle subset=pid separately in userns visibility checks Alexey Gladkov
2026-04-27 8:26 ` [PATCH v10 7/7] docs: proc: add documentation about mount restrictions Alexey Gladkov
2026-04-27 15:54 ` [PATCH v10 0/7] proc: subset=pid: Relax check of mount visibility Christian Brauner
2026-04-27 22:34 ` Aleksa Sarai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260421-duzen-laugenbrezel-e1d1138ff5ae@brauner \
--to=brauner@kernel.org \
--cc=containers@lists.linux.dev \
--cc=cyphar@cyphar.com \
--cc=danilklishch@gmail.com \
--cc=ebiederm@xmission.com \
--cc=keescook@chromium.org \
--cc=legion@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox