Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Mickaël Salaün" <mic@digikod.net>
To: Song Liu <song@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	 bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	 linux-security-module@vger.kernel.org, kernel-team@meta.com,
	andrii@kernel.org, eddyz87@gmail.com,  ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev, brauner@kernel.org,
	 kpsingh@kernel.org, mattbobrowski@google.com,
	amir73il@gmail.com, repnop@google.com,  jlayton@kernel.org,
	josef@toxicpanda.com, gnoack@google.com,
	 Tingmao Wang <m@maowtm.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Introduce path iterator
Date: Fri, 30 May 2025 14:20:39 +0200	[thread overview]
Message-ID: <20250530.euz5beesaSha@digikod.net> (raw)
In-Reply-To: <CAPhsuW6-J+NUe=jX51wGVP=nMFjETu+1LUTsWZiBa1ckwq7b+w@mail.gmail.com>

On Thu, May 29, 2025 at 05:42:16PM -0700, Song Liu wrote:
> On Thu, May 29, 2025 at 4:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Thu, May 29, 2025 at 03:13:10PM -0700, Song Liu wrote:
> >
> > > Is it an issue if we only hold a reference to a MNT_LOCKED mount for
> > > short period of time? "Short period" means it may get interrupted, page
> > > faults, or wait for an IO (read xattr), but it won't hold a reference to the
> > > mount and sleep indefinitely.
> >
> > MNT_LOCKED mount itself is not a problem.  What shouldn't be done is
> > looking around in the mountpoint it covers.  It depends upon the things
> > you are going to do with that, but it's very easy to get an infoleak
> > that way.
> >
> > > > OTOH, there's a good cause for moving some of the flags, MNT_LOCKED
> > > > included, out of ->mnt_flags and into a separate field in struct mount.
> > > > However, that would conflict with any code using that to deal with
> > > > your iterator safely.
> > > >
> > > > What's more, AFAICS in case of a stack of mounts each covering the root
> > > > of parent mount, you stop in each of those.  The trouble is, umount(2)
> > > > propagation logics assumes that intermediate mounts can be pulled out of
> > > > such stack without causing trouble.  For pathname resolution that is
> > > > true; it goes through the entire stack atomically wrt that stuff.
> > > > For your API that's not the case; somebody who has no idea about an
> > > > intermediate mount being there might get caught on it while it's getting
> > > > pulled from the stack.
> > > >
> > > > What exactly do you need around the mountpoint crossing?
> > >
> > > I thought about skipping intermediate mounts (that are hidden by
> > > other mounts). AFAICT, not skipping them will not cause any issue.
> >
> > It can.  Suppose e.g. that /mnt gets propagation from another namespace,
> > but not the other way round and you mount something on /mnt.
> >
> > Later, in that another namespace, somebody mounts something on wherever
> > your /mnt gets propagation to.  A copy will be propagated _between_
> > your /mnt and whatever you've mounted on top of it; it will be entirely
> > invisible until you umount your /mnt.  At that point the propagated
> > copy will show up there, same as if it had appeared just after your
> > umount.  Prior to that it's entirely invisible.  If its original
> > counterpart in another namespace gets unmounted first, the copy will
> > be quietly pulled out.
> 
> Thanks for sharing this information!
> 
> > Note that choose_mountpoint_rcu() callers (including choose_mountpoint())
> > will have mount_lock seqcount sampled before the traversal _and_ recheck
> > it after having reached the bottom of stack.  IOW, if you traverse ..
> > on the way to root, you won't get caught on the sucker being pulled out.
> 
> In some of our internal discussions, we talked about using
> choose_mountpoint() instead of follow_up(). I didn't go that direction in this
> version because it requires holding "root". But if it makes more sense
> to use, choose_mountpoint(), we sure can hold "root".
> 
> Alternatively, I think it is also OK to pass a zero'ed root to
> choose_mountpoint().
> 
> > Your iterator, OTOH, would stop in that intermediate mount - and get
> > an unpleasant surprise when it comes back to do the next step (towards
> > /mnt on root filesystem, that is) and finds that path->mnt points
> > to something that is detached from everything - no way to get from
> > it any further.  That - despite the fact that location you've started
> > from is still mounted, still has the same pathname, etc. and nothing
> > had been disrupted for it.
> >
> > And yes, landlock has a narrow race in the matching place.  Needs to
> > be fixed.  At least it does ignore those as far as any decisions are
> > concerned...

Thanks for pointing this out.  In the case of Landlock, walking to a
disconnected mount point (because of this umount race condition) would
deny the requested access whereas it may be allowed otherwise.  This is
not a security issue but still an issue because an event unrelated to
the request (umount) can abort a path resolution, which should not be
the case.

Without access to mount_lock, what would be the best way to fix this
Landlock issue while making it backportable?

> 
> If we update path_parent in this patchset with choose_mountpoint(),
> and use it in Landlock, we will close this race condition, right?

choose_mountpoint() is currently private, but if we add a new filesystem
helper, I think the right approach would be to expose follow_dotdot(),
updating its arguments with public types.  This way the intermediates
mount points will not be exposed, RCU optimization will be leveraged,
and usage of this new helper will be simplified.

> 
> >
> > Note, BTW, that it might be better off by doing that similar to
> > d_path.c - without arseloads of dget_parent/dput et.al.; not sure
> > how feasible it is, but if everything in it can be done under
> > rcu_read_lock(), that's something to look into.
> 
> I don't think we can do everything here inside rcu_read_lock().
> But d_path.c does have some code we can probably reuse or
> learn from. Also, we probably need two variations of iterators,
> one walk until absolute root, while the other walk until root of
> current->fs, just like d_path() vs. d_absolute_path(). Does this
> sound reasonable?

Passing the root to a public follow_dotdot() helper should do the job.

> 
> Thanks,
> Song
>

next prev parent reply	other threads:[~2025-05-30 12:27 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-28 22:26 [PATCH bpf-next 0/4] bpf path iterator Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 1/4] namei: Introduce new helper function path_parent() Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 2/4] landlock: Use path_parent() Song Liu
2025-05-31 13:51   ` Tingmao Wang
2025-06-02 13:36     ` Song Liu
2025-06-03  0:10       ` Song Liu
2025-06-03 12:47         ` Mickaël Salaün
2025-06-02 17:35     ` Mickaël Salaün
2025-06-02 22:56       ` Tingmao Wang
2025-05-28 22:26 ` [PATCH bpf-next 3/4] bpf: Introduce path iterator Song Liu
2025-05-28 22:37   ` Al Viro
2025-05-29 11:58     ` Jan Kara
2025-05-29 16:53       ` Song Liu
2025-05-29 16:57         ` Alexei Starovoitov
2025-05-29 17:05           ` Song Liu
2025-05-30 14:20             ` Mickaël Salaün
2025-06-02  9:41               ` Christian Brauner
2025-06-03  9:46               ` Jan Kara
2025-06-03 12:49                 ` Mickaël Salaün
2025-06-03 21:13                   ` Jan Kara
2025-05-29 17:38         ` Al Viro
2025-05-29 18:00           ` Song Liu
2025-05-29 18:35             ` Al Viro
2025-05-29 19:46               ` Song Liu
2025-05-29 20:15                 ` Al Viro
2025-05-29 21:07                   ` Song Liu
2025-05-29 21:45                     ` Al Viro
2025-05-29 22:13                       ` Song Liu
2025-05-29 23:10                         ` Al Viro
2025-05-30  0:42                           ` Song Liu
2025-05-30 12:20                             ` Mickaël Salaün [this message]
2025-05-30 18:43                               ` Al Viro
2025-05-31  8:39                                 ` Mickaël Salaün
2025-06-02  9:32                                 ` Christian Brauner
2025-05-30 18:55                               ` Song Liu
2025-05-31  8:40                                 ` Mickaël Salaün
2025-05-31 14:05                                 ` Tingmao Wang
2025-06-01 23:33                                   ` Song Liu
2025-06-04  0:58                             ` Tingmao Wang
2025-06-02  9:30                           ` Christian Brauner
2025-06-02  9:27             ` Christian Brauner
2025-06-02 13:27               ` Song Liu
2025-06-02 15:40                 ` Alexei Starovoitov
2025-06-02 21:39                   ` Song Liu
2025-05-28 22:26 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for bpf " Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250530.euz5beesaSha@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=martin.lau@linux.dev \
    --cc=mattbobrowski@google.com \
    --cc=repnop@google.com \
    --cc=song@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.