From: "Mickaël Salaün" <mic@digikod.net>
To: Song Liu <song@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, kernel-team@meta.com,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev, brauner@kernel.org,
kpsingh@kernel.org, mattbobrowski@google.com,
amir73il@gmail.com, repnop@google.com, jlayton@kernel.org,
josef@toxicpanda.com, gnoack@google.com,
Tingmao Wang <m@maowtm.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator
Date: Fri, 30 May 2025 14:20:39 +0200 [thread overview]
Message-ID: <20250530.euz5beesaSha@digikod.net> (raw)
In-Reply-To: <CAPhsuW6-J+NUe=jX51wGVP=nMFjETu+1LUTsWZiBa1ckwq7b+w@mail.gmail.com>
On Thu, May 29, 2025 at 05:42:16PM -0700, Song Liu wrote:
> On Thu, May 29, 2025 at 4:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote:
> >
> > > Is it an issue if we only hold a reference to a MNT_LOCKED mount for
> > > short period of time? "Short period" means it may get interrupted, page
> > > faults, or wait for an IO (read xattr), but it won't hold a reference to the
> > > mount and sleep indefinitely.
> >
> > MNT_LOCKED mount itself is not a problem. What shouldn't be done is
> > looking around in the mountpoint it covers. It depends upon the things
> > you are going to do with that, but it's very easy to get an infoleak
> > that way.
> >
> > > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED
> > > > included, out of ->mnt_flags and into a separate field in struct mount.
> > > > However, that would conflict with any code using that to deal with
> > > > your iterator safely.
> > > >
> > > > What's more, AFAICS in case of a stack of mounts each covering the root
> > > > of parent mount, you stop in each of those. The trouble is, umount(2)
> > > > propagation logics assumes that intermediate mounts can be pulled out of
> > > > such stack without causing trouble. For pathname resolution that is
> > > > true; it goes through the entire stack atomically wrt that stuff.
> > > > For your API that's not the case; somebody who has no idea about an
> > > > intermediate mount being there might get caught on it while it's getting
> > > > pulled from the stack.
> > > >
> > > > What exactly do you need around the mountpoint crossing?
> > >
> > > I thought about skipping intermediate mounts (that are hidden by
> > > other mounts). AFAICT, not skipping them will not cause any issue.
> >
> > It can. Suppose e.g. that /mnt gets propagation from another namespace,
> > but not the other way round and you mount something on /mnt.
> >
> > Later, in that another namespace, somebody mounts something on wherever
> > your /mnt gets propagation to. A copy will be propagated _between_
> > your /mnt and whatever you've mounted on top of it; it will be entirely
> > invisible until you umount your /mnt. At that point the propagated
> > copy will show up there, same as if it had appeared just after your
> > umount. Prior to that it's entirely invisible. If its original
> > counterpart in another namespace gets unmounted first, the copy will
> > be quietly pulled out.
>
> Thanks for sharing this information!
>
> > Note that choose_mountpoint_rcu() callers (including choose_mountpoint())
> > will have mount_lock seqcount sampled before the traversal _and_ recheck
> > it after having reached the bottom of stack. IOW, if you traverse ..
> > on the way to root, you won't get caught on the sucker being pulled out.
>
> In some of our internal discussions, we talked about using
> choose_mountpoint() instead of follow_up(). I didn't go that direction in this
> version because it requires holding "root". But if it makes more sense
> to use, choose_mountpoint(), we sure can hold "root".
>
> Alternatively, I think it is also OK to pass a zero'ed root to
> choose_mountpoint().
>
> > Your iterator, OTOH, would stop in that intermediate mount - and get
> > an unpleasant surprise when it comes back to do the next step (towards
> > /mnt on root filesystem, that is) and finds that path->mnt points
> > to something that is detached from everything - no way to get from
> > it any further. That - despite the fact that location you've started
> > from is still mounted, still has the same pathname, etc. and nothing
> > had been disrupted for it.
> >
> > And yes, landlock has a narrow race in the matching place. Needs to
> > be fixed. At least it does ignore those as far as any decisions are
> > concerned...
Thanks for pointing this out. In the case of Landlock, walking to a
disconnected mount point (because of this umount race condition) would
deny the requested access whereas it may be allowed otherwise. This is
not a security issue but still an issue because an event unrelated to
the request (umount) can abort a path resolution, which should not be
the case.
Without access to mount_lock, what would be the best way to fix this
Landlock issue while making it backportable?
>
> If we update path_parent in this patchset with choose_mountpoint(),
> and use it in Landlock, we will close this race condition, right?
choose_mountpoint() is currently private, but if we add a new filesystem
helper, I think the right approach would be to expose follow_dotdot(),
updating its arguments with public types. This way the intermediates
mount points will not be exposed, RCU optimization will be leveraged,
and usage of this new helper will be simplified.
>
> >
> > Note, BTW, that it might be better off by doing that similar to
> > d_path.c - without arseloads of dget_parent/dput et.al.; not sure
> > how feasible it is, but if everything in it can be done under
> > rcu_read_lock(), that's something to look into.
>
> I don't think we can do everything here inside rcu_read_lock().
> But d_path.c does have some code we can probably reuse or
> learn from. Also, we probably need two variations of iterators,
> one walk until absolute root, while the other walk until root of
> current->fs, just like d_path() vs. d_absolute_path(). Does this
> sound reasonable?
Passing the root to a public follow_dotdot() helper should do the job.
>
> Thanks,
> Song
>
next prev parent reply other threads:[~2025-05-30 12:20 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 22:26 [PATCH bpf-next 0/4] bpf path iterator Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 1/4] namei: Introduce new helper function path_parent() Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 2/4] landlock: Use path_parent() Song Liu
2025-05-31 13:51 ` Tingmao Wang
2025-06-02 13:36 ` Song Liu
2025-06-03 0:10 ` Song Liu
2025-06-03 12:47 ` Mickaël Salaün
2025-06-02 17:35 ` Mickaël Salaün
2025-06-02 22:56 ` Tingmao Wang
2025-05-28 22:26 ` [PATCH bpf-next 3/4] bpf: Introduce path iterator Song Liu
2025-05-28 22:37 ` Al Viro
2025-05-29 11:58 ` Jan Kara
2025-05-29 16:53 ` Song Liu
2025-05-29 16:57 ` Alexei Starovoitov
2025-05-29 17:05 ` Song Liu
2025-05-30 14:20 ` Mickaël Salaün
2025-06-02 9:41 ` Christian Brauner
2025-06-03 9:46 ` Jan Kara
2025-06-03 12:49 ` Mickaël Salaün
2025-06-03 21:13 ` Jan Kara
2025-05-29 17:38 ` Al Viro
2025-05-29 18:00 ` Song Liu
2025-05-29 18:35 ` Al Viro
2025-05-29 19:46 ` Song Liu
2025-05-29 20:15 ` Al Viro
2025-05-29 21:07 ` Song Liu
2025-05-29 21:45 ` Al Viro
2025-05-29 22:13 ` Song Liu
2025-05-29 23:10 ` Al Viro
2025-05-30 0:42 ` Song Liu
2025-05-30 12:20 ` Mickaël Salaün [this message]
2025-05-30 18:43 ` Al Viro
2025-05-31 8:39 ` Mickaël Salaün
2025-06-02 9:32 ` Christian Brauner
2025-05-30 18:55 ` Song Liu
2025-05-31 8:40 ` Mickaël Salaün
2025-05-31 14:05 ` Tingmao Wang
2025-06-01 23:33 ` Song Liu
2025-06-04 0:58 ` Tingmao Wang
2025-06-02 9:30 ` Christian Brauner
2025-06-02 9:27 ` Christian Brauner
2025-06-02 13:27 ` Song Liu
2025-06-02 15:40 ` Alexei Starovoitov
2025-06-02 21:39 ` Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf " Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250530.euz5beesaSha@digikod.net \
--to=mic@digikod.net \
--cc=amir73il@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=m@maowtm.org \
--cc=martin.lau@linux.dev \
--cc=mattbobrowski@google.com \
--cc=repnop@google.com \
--cc=song@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).