From: "Mickaël Salaün" <mic@digikod.net>
To: Christian Brauner <brauner@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>, Song Liu <song@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
amir73il@gmail.com, andrii@kernel.org, ast@kernel.org,
bpf@vger.kernel.org, daniel@iogearbox.net, eddyz87@gmail.com,
gnoack@google.com, jack@suse.cz, jlayton@kernel.org,
josef@toxicpanda.com, kernel-team@meta.com, kpsingh@kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, martin.lau@linux.dev,
mattbobrowski@google.com, repnop@google.com
Subject: Re: [PATCH v3 bpf-next 0/5] bpf path iterator
Date: Wed, 11 Jun 2025 17:39:00 +0200 [thread overview]
Message-ID: <20250611.faich0Chohg3@digikod.net> (raw)
In-Reply-To: <20250611-bindung-pulver-6158a3053c87@brauner>
On Wed, Jun 11, 2025 at 01:36:46PM +0200, Christian Brauner wrote:
> On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> > On 6/9/25 07:23, Song Liu wrote:
> > > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
> > > [...]
> > >> Hi Song, Christian, Al and others,
> > >>
> > >> Previously I proposed in [1] to add ability to do a reference-less parent
> > >> walk for Landlock. However, as Christian pointed out and I do agree in
> > >> hindsight, it is not a good idea to do things like this in non-VFS code.
> > >>
> > >> However, I still think this is valuable to consider given the performance
> > >> improvement, and after some discussion with Mickaël, I would like to
> > >> propose extending Song's helper to support such usage. While I recognize
> > >> that this patch series is already in its v3, and I do not want to delay it
> > >> by too much, putting this proposal out now is still better than after this
> > >> has merged, so that we may consider signature changes.
> > >>
> > >> I've created a proof-of-concept and did some brief testing. The
> > >> performance improvement attained here is the same as in [1] (with a "git
> > >> status" workload, median landlock overhead 35% -> 28%, median time in
> > >> landlock decreases by 26.6%).
> > >>
> > >> If this idea is accepted, I'm happy to work on it further, split out this
> > >> patch, update the comments and do more testing etc, potentially in
> > >> collaboration with Song.
> > >>
> > >> An alternative to this is perhaps to add a new helper
> > >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> > >> by Landlock. I'm happy to do it either way, but with some experimentation
> > >> I personally think that the code in this patch is still clean enough, and
> > >> can avoid some duplication.
> > >>
> > >> Patch title: path_walk_parent: support reference-less walk
> > >>
> > >> A later commit will update the BPF path iterator to use this.
> > >>
> > >> Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > [...]
> > >>
> > >> -bool path_walk_parent(struct path *path, const struct path *root);
> > >> +struct parent_iterator {
> > >> + struct path path;
> > >> + struct path root;
> > >> + bool rcu;
> > >> + /* expected seq of path->dentry */
> > >> + unsigned next_seq;
> > >> + unsigned m_seq, r_seq;
> > >
> > > Most of parent_iterator is not really used by reference walk.
> > > So it is probably just separate the two APIs?
> >
> > I don't mind either way, but I feel like it might be nice to just have one
> > style of APIs (i.e. an iterator with start / end / next vs just one
> > function), even though this is not totally necessary for the ref-taking
> > walk. After all, the BPF use case is iterator-based. This also means
> > that the code at the user's side (mostly thinking of Landlock here) is
> > slightly simpler.
> >
> > But I've not experimented with the other way. I'm open to both, and I'm
> > happy to send a patch later for a separate API (in that case that would
> > not depend on this and I might just start a new series).
> >
> > Would like to hear what VFS folks thinks of this first tho, and whether
> > there's any preference in one or two APIs.
>
> I really dislike exposing the sequence number for mounts an for
> dentries. That's just nonsense and a non-VFS low-level consumer of this
> API has zero business caring about any of that. It's easy to
> misunderstand, it's easy to abuse so that's not a good way of doing
> this. It's the wrong API.
>
> >
> > >
> > > Also, is it ok to make m_seq and r_seq available out of fs/?
>
> No, it's not.
>
> >
> > The struct is not intended to be used directly by code outside. Not sure
>
> That doesn't mean anything. It's simply the wrong API if it has to spill
> so much of its bowels.
>
> > what is the standard way to do this but we can make it private by e.g.
> > putting the seq values in another struct, if needed. Alternatively I
> > think we can hide the entire struct behind an opaque pointer by doing the
> > allocation ourselves.
> >
> > >
> > >> +};
> > >> +
> > >> +#define PATH_WALK_PARENT_UPDATED 0
> > >> +#define PATH_WALK_PARENT_ALREADY_ROOT -1
> > >> +#define PATH_WALK_PARENT_RETRY -2
> > >> +
> > >> +void path_walk_parent_start(struct parent_iterator *pit,
> > >> + const struct path *path, const struct path *root,
> > >> + bool ref_less);
> > >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> > >> +int path_walk_parent_end(struct parent_iterator *pit);
> > >
> > > I think it is better to make this rcu walk a separate set of APIs.
> > > IOW, we will have:
> > >
> > > int path_walk_parent(struct path *path, struct path *root);
> > >
> > > and
> > >
> > > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> > > const struct path *path, const struct path *root);
> > > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > > *next_parent);
> > > int path_walk_parent_rcu_end(struct parent_iterator *pit);
> >
> > (replied above)
>
> Exposing two sets of different APIs for essentially the same things is
> not going to happen.
>
> The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
> we are absolutely not going to do that for outside stuff.
>
> It always does the try RCU first, then try to continue the walk by
> falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
> work then let the caller know and require them to decide whether to
> abort or redo everything in ref-walk.
That's indeed what is done by choose_mountpoint() (relying on
choose_mountpoint_rcu() when possible), but this proposal is about doing
a full path walk (i.e. multiple calls to path_walk_parent) within RCU.
>
> There's zero need in that scheme for the caller to see any of the
> internals of the VFS and that's what you should aim for.
Yes, but how could we detect if a full path walk is invalid (because of
a rename or mount change)?
next prev parent reply other threads:[~2025-06-11 15:39 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-10 17:18 ` Mickaël Salaün
2025-06-10 17:26 ` Song Liu
2025-06-10 22:26 ` Tingmao Wang
2025-06-10 22:34 ` Tingmao Wang
2025-06-10 23:08 ` Song Liu
2025-06-11 0:23 ` Tingmao Wang
2025-06-10 23:34 ` NeilBrown
2025-06-11 0:56 ` Song Liu
2025-06-11 15:42 ` Mickaël Salaün
2025-06-11 16:31 ` Song Liu
2025-06-11 17:50 ` Tingmao Wang
2025-06-11 18:08 ` Song Liu
2025-06-12 9:01 ` Jan Kara
2025-06-12 9:49 ` Jan Kara
2025-06-12 12:31 ` Christian Brauner
2025-06-16 0:24 ` Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()) Tingmao Wang
2025-06-17 6:20 ` Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-06-08 18:45 ` Tingmao Wang
2025-06-06 21:30 ` [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
2025-06-09 6:23 ` Song Liu
2025-06-09 8:08 ` Tingmao Wang
2025-06-11 11:36 ` Christian Brauner
2025-06-11 15:39 ` Mickaël Salaün [this message]
2025-06-08 17:32 ` Tingmao Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250611.faich0Chohg3@digikod.net \
--to=mic@digikod.net \
--cc=amir73il@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=m@maowtm.org \
--cc=martin.lau@linux.dev \
--cc=mattbobrowski@google.com \
--cc=repnop@google.com \
--cc=song@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).