linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: Christian Brauner <brauner@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>, Song Liu <song@kernel.org>,
	 Al Viro <viro@zeniv.linux.org.uk>,
	amir73il@gmail.com, andrii@kernel.org, ast@kernel.org,
	 bpf@vger.kernel.org, daniel@iogearbox.net, eddyz87@gmail.com,
	gnoack@google.com,  jack@suse.cz, jlayton@kernel.org,
	josef@toxicpanda.com, kernel-team@meta.com,  kpsingh@kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-security-module@vger.kernel.org, martin.lau@linux.dev,
	mattbobrowski@google.com,  repnop@google.com
Subject: Re: [PATCH v3 bpf-next 0/5] bpf path iterator
Date: Wed, 11 Jun 2025 17:39:00 +0200	[thread overview]
Message-ID: <20250611.faich0Chohg3@digikod.net> (raw)
In-Reply-To: <20250611-bindung-pulver-6158a3053c87@brauner>

On Wed, Jun 11, 2025 at 01:36:46PM +0200, Christian Brauner wrote:
> On Mon, Jun 09, 2025 at 09:08:34AM +0100, Tingmao Wang wrote:
> > On 6/9/25 07:23, Song Liu wrote:
> > > On Sun, Jun 8, 2025 at 10:34 AM Tingmao Wang <m@maowtm.org> wrote:
> > > [...]
> > >> Hi Song, Christian, Al and others,
> > >>
> > >> Previously I proposed in [1] to add ability to do a reference-less parent
> > >> walk for Landlock.  However, as Christian pointed out and I do agree in
> > >> hindsight, it is not a good idea to do things like this in non-VFS code.
> > >>
> > >> However, I still think this is valuable to consider given the performance
> > >> improvement, and after some discussion with Mickaël, I would like to
> > >> propose extending Song's helper to support such usage.  While I recognize
> > >> that this patch series is already in its v3, and I do not want to delay it
> > >> by too much, putting this proposal out now is still better than after this
> > >> has merged, so that we may consider signature changes.
> > >>
> > >> I've created a proof-of-concept and did some brief testing.  The
> > >> performance improvement attained here is the same as in [1] (with a "git
> > >> status" workload, median landlock overhead 35% -> 28%, median time in
> > >> landlock decreases by 26.6%).
> > >>
> > >> If this idea is accepted, I'm happy to work on it further, split out this
> > >> patch, update the comments and do more testing etc, potentially in
> > >> collaboration with Song.
> > >>
> > >> An alternative to this is perhaps to add a new helper
> > >> path_walk_parent_rcu, also living in namei.c, that will be used directly
> > >> by Landlock.  I'm happy to do it either way, but with some experimentation
> > >> I personally think that the code in this patch is still clean enough, and
> > >> can avoid some duplication.
> > >>
> > >> Patch title: path_walk_parent: support reference-less walk
> > >>
> > >> A later commit will update the BPF path iterator to use this.
> > >>
> > >> Signed-off-by: Tingmao Wang <m@maowtm.org>
> > > [...]
> > >>
> > >> -bool path_walk_parent(struct path *path, const struct path *root);
> > >> +struct parent_iterator {
> > >> +       struct path path;
> > >> +       struct path root;
> > >> +       bool rcu;
> > >> +       /* expected seq of path->dentry */
> > >> +       unsigned next_seq;
> > >> +       unsigned m_seq, r_seq;
> > > 
> > > Most of parent_iterator is not really used by reference walk.
> > > So it is probably just separate the two APIs?
> > 
> > I don't mind either way, but I feel like it might be nice to just have one
> > style of APIs (i.e. an iterator with start / end / next vs just one
> > function), even though this is not totally necessary for the ref-taking
> > walk.  After all, the BPF use case is iterator-based.  This also means
> > that the code at the user's side (mostly thinking of Landlock here) is
> > slightly simpler.
> > 
> > But I've not experimented with the other way.  I'm open to both, and I'm
> > happy to send a patch later for a separate API (in that case that would
> > not depend on this and I might just start a new series).
> > 
> > Would like to hear what VFS folks thinks of this first tho, and whether
> > there's any preference in one or two APIs.
> 
> I really dislike exposing the sequence number for mounts an for
> dentries. That's just nonsense and a non-VFS low-level consumer of this
> API has zero business caring about any of that. It's easy to
> misunderstand, it's easy to abuse so that's not a good way of doing
> this. It's the wrong API.
> 
> > 
> > > 
> > > Also, is it ok to make m_seq and r_seq available out of fs/?
> 
> No, it's not.
> 
> > 
> > The struct is not intended to be used directly by code outside.  Not sure
> 
> That doesn't mean anything. It's simply the wrong API if it has to spill
> so much of its bowels.
> 
> > what is the standard way to do this but we can make it private by e.g.
> > putting the seq values in another struct, if needed.  Alternatively I
> > think we can hide the entire struct behind an opaque pointer by doing the
> > allocation ourselves.
> > 
> > > 
> > >> +};
> > >> +
> > >> +#define PATH_WALK_PARENT_UPDATED               0
> > >> +#define PATH_WALK_PARENT_ALREADY_ROOT  -1
> > >> +#define PATH_WALK_PARENT_RETRY                 -2
> > >> +
> > >> +void path_walk_parent_start(struct parent_iterator *pit,
> > >> +                           const struct path *path, const struct path *root,
> > >> +                           bool ref_less);
> > >> +int path_walk_parent(struct parent_iterator *pit, struct path *next_parent);
> > >> +int path_walk_parent_end(struct parent_iterator *pit);
> > > 
> > > I think it is better to make this rcu walk a separate set of APIs.
> > > IOW, we will have:
> > > 
> > > int path_walk_parent(struct path *path, struct path *root);
> > > 
> > > and
> > > 
> > > void path_walk_parent_rcu_start(struct parent_iterator *pit,
> > >                            const struct path *path, const struct path *root);
> > > int path_walk_parent_rcu_next(struct parent_iterator *pit, struct path
> > > *next_parent);
> > > int path_walk_parent_rcu_end(struct parent_iterator *pit);
> > 
> > (replied above)
> 
> Exposing two sets of different APIs for essentially the same things is
> not going to happen.
> 
> The VFS doesn't expose a rcu variant and a non-rcu variant for itself so
> we are absolutely not going to do that for outside stuff.
> 
> It always does the try RCU first, then try to continue the walk by
> falling back to REF walk (e.g., via try_to_unlazy()). If that doesn't
> work then let the caller know and require them to decide whether to
> abort or redo everything in ref-walk.

That's indeed what is done by choose_mountpoint() (relying on
choose_mountpoint_rcu() when possible), but this proposal is about doing
a full path walk (i.e. multiple calls to path_walk_parent) within RCU.

> 
> There's zero need in that scheme for the caller to see any of the
> internals of the VFS and that's what you should aim for.

Yes, but how could we detect if a full path walk is invalid (because of
a rename or mount change)?

  reply	other threads:[~2025-06-11 15:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 21:30 [PATCH v3 bpf-next 0/5] bpf path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-10 17:18   ` Mickaël Salaün
2025-06-10 17:26     ` Song Liu
2025-06-10 22:26       ` Tingmao Wang
2025-06-10 22:34         ` Tingmao Wang
2025-06-10 23:08         ` Song Liu
2025-06-11  0:23           ` Tingmao Wang
2025-06-10 23:34   ` NeilBrown
2025-06-11  0:56     ` Song Liu
2025-06-11 15:42       ` Mickaël Salaün
2025-06-11 16:31         ` Song Liu
2025-06-11 17:50           ` Tingmao Wang
2025-06-11 18:08             ` Song Liu
2025-06-12  9:01               ` Jan Kara
2025-06-12  9:49                 ` Jan Kara
2025-06-12 12:31                   ` Christian Brauner
2025-06-16  0:24                     ` Ref-less parent walk from Landlock (was: Re: [PATCH v3 bpf-next 1/5] namei: Introduce new helper function path_walk_parent()) Tingmao Wang
2025-06-17  6:20                       ` Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-06-08 18:45   ` Tingmao Wang
2025-06-06 21:30 ` [PATCH v3 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-06 21:30 ` [PATCH v3 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-08 17:32 ` [PATCH v3 bpf-next 0/5] bpf path iterator Tingmao Wang
2025-06-09  6:23   ` Song Liu
2025-06-09  8:08     ` Tingmao Wang
2025-06-11 11:36       ` Christian Brauner
2025-06-11 15:39         ` Mickaël Salaün [this message]
2025-06-08 17:32 ` Tingmao Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250611.faich0Chohg3@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=martin.lau@linux.dev \
    --cc=mattbobrowski@google.com \
    --cc=repnop@google.com \
    --cc=song@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).