linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: NeilBrown <neil@brown.name>
Cc: "Song Liu" <songliubraving@meta.com>,
	"Tingmao Wang" <m@maowtm.org>, "Mickaël Salaün" <mic@digikod.net>,
	"Song Liu" <song@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-security-module@vger.kernel.org"
	<linux-security-module@vger.kernel.org>,
	"Kernel Team" <kernel-team@meta.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"eddyz87@gmail.com" <eddyz87@gmail.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"mattbobrowski@google.com" <mattbobrowski@google.com>,
	"Günther Noack" <gnoack@google.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
Date: Mon, 7 Jul 2025 13:17:09 +0200	[thread overview]
Message-ID: <20250707-netto-campieren-501525a7d10a@brauner> (raw)
In-Reply-To: <20250707-kneifen-zielvereinbarungen-62c1ccdbb9c6@brauner>

On Mon, Jul 07, 2025 at 12:46:41PM +0200, Christian Brauner wrote:
> On Fri, Jun 27, 2025 at 08:51:21AM +1000, NeilBrown wrote:
> > On Fri, 27 Jun 2025, Song Liu wrote:
> > > 
> > > 
> > > > On Jun 26, 2025, at 3:22 AM, NeilBrown <neil@brown.name> wrote:
> > > 
> > > [...]
> > > 
> > > >> I guess I misunderstood the proposal of vfs_walk_ancestors() 
> > > >> initially, so some clarification:
> > > >> 
> > > >> I think vfs_walk_ancestors() is good for the rcu-walk, and some 
> > > >> rcu-then-ref-walk. However, I don’t think it fits all use cases. 
> > > >> A reliable step-by-step ref-walk, like this set, works well with 
> > > >> BPF, and we want to keep it.
> > > > 
> > > > The distinction between rcu-walk and ref-walk is an internal
> > > > implementation detail.  You as a caller shouldn't need to think about
> > > > the difference.  You just want to walk.  Note that LOOKUP_RCU is
> > > > documented in namei.h as "semi-internal".  The only uses outside of
> > > > core-VFS code is in individual filesystem's d_revalidate handler - they
> > > > are checking if they are allowed to sleep or not.  You should never
> > > > expect to pass LOOKUP_RCU to an VFS API - no other code does.
> > > > 
> > > > It might be reasonable for you as a caller to have some control over
> > > > whether the call can sleep or not.  LOOKUP_CACHED is a bit like that.
> > > > But for dotdot lookup the code will never sleep - so that is not
> > > > relevant.
> > > 
> > > Unfortunately, the BPF use case is more complicated. In some cases, 
> > > the callback function cannot be call in rcu critical sections. For 
> > > example, the callback may need to read xatter. For these cases, we
> > > we cannot use RCU walk at all. 
> > 
> > I really think you should stop using the terms RCU walk and ref-walk.  I
> > think they might be focusing your thinking in an unhelpful direction.
> 
> Thank you! I really appreciate you helping to shape this API and it
> aligns a lot with my thinking.
> 
> > The key issue about reading xattrs is that it might need to sleep.
> > Focusing on what might need to sleep and what will never need to sleep
> > is a useful approach - the distinction is wide spread in the kernel and
> > several function take a flag indicating if they are permitted to sleep,
> > or if failure when sleeping would be required.
> > 
> > So your above observation is better described as 
> > 
> >    The vfs_walk_ancestors() API has an (implicit) requirement that the
> >    callback mustn't sleep.  This is a problem for some use-cases
> >    where the call back might need to sleep - e.g. for accessing xattrs.
> > 
> > That is a good and useful observation.  I can see three possibly
> > responses:
> > 
> > 1/ Add a vfs_walk_ancestors_maysleep() API for which the callback is
> >    always allowed to sleep.  I don't particularly like this approach.
> 
> Agreed.
> 
> > 
> > 2/ Use repeated calls to vfs_walk_parent() when the handling of each
> >    ancestor might need to sleep.  I see no problem with supporting both
> >    vfs_walk_ancestors() and vfs_walk_parent().  There is plenty of
> >    precedent for having different  interfaces for different use cases.
> 
> Meh.
> 
> > 
> > 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.
> 
> I think that's fine.

Ok, sorry for the delay but there's a lot of different things going on
right now and this one isn't exactly an easy thing to solve.

I mentioned this before and so did Neil: the lookup implementation
supports two modes sleeping and non-sleeping. That api is abstracted
away as heavily as possible by the VFS so that non-core code will not be
exposed to it other than in exceptional circumstances and doesn't have
to care about it.

It is a conceptual dead-end to expose these two modes via separate APIs
and leak this implementation detail into non-core code. It will not
happen as far as I'm concerned.

I very much understand the urge to get the refcount step-by-step thing
merged asap. Everyone wants their APIs merged fast. And if it's
reasonable to move fast we will (see the kernfs xattr thing).

But here are two use-cases that ask for the same thing with different
constraints that closely mirror our unified approach. Merging one
quickly just to have something and then later bolting the other one on
top, augmenting, or replacing, possible having to deprecate the old API
is just objectively nuts. That's how we end up with a spaghetthi helper
collection. We want as little helper fragmentation as possible.

We need a unified API that serves both use-cases. I dislike
callback-based APIs generally but we have precedent in the VFS for this
for cases where the internal state handling is delicate enough that it
should not be exposed (see __iterate_supers() which does exactly work
like Neil suggested down to the flag argument itself I added).

So I'm open to the callback solution.

(Note for really absurd perf requirements you could even make it work
with static calls I'm pretty sure.)

  reply	other threads:[~2025-07-07 11:17 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17  6:11 [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-18  1:02   ` kernel test robot
2025-06-24 12:18   ` Jan Kara
2025-06-24 17:37     ` Song Liu
2025-06-25 10:30       ` Jan Kara
2025-07-04 17:40   ` Yonghong Song
2025-07-06 23:54     ` Song Liu
2025-07-07 17:53       ` Yonghong Song
2025-06-17  6:11 ` [PATCH v5 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-07-03 18:29   ` Mickaël Salaün
2025-07-03 22:27     ` Song Liu
2025-07-04  9:00       ` Mickaël Salaün
2025-07-06 22:29         ` Song Liu
2025-07-07 10:28         ` Christian Brauner
2025-06-17  6:11 ` [PATCH v5 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-20 21:59 ` [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-24 18:45   ` Mickaël Salaün
2025-06-24 21:38     ` NeilBrown
2025-06-25 13:14       ` Mickaël Salaün
2025-06-25 23:04         ` NeilBrown
2025-06-25 23:17           ` Song Liu
2025-06-26  0:07           ` Tingmao Wang
2025-06-26  1:05             ` NeilBrown
2025-06-26  5:52               ` Song Liu
2025-06-26  9:43                 ` Mickaël Salaün
2025-06-26 14:49                   ` Song Liu
2025-06-26 10:22                 ` NeilBrown
2025-06-26 14:28                   ` Song Liu
2025-06-26 22:51                     ` NeilBrown
2025-06-27  0:21                       ` Song Liu
2025-07-07 10:46                       ` Christian Brauner
2025-07-07 11:17                         ` Christian Brauner [this message]
2025-07-07 18:50                           ` Song Liu
2025-07-09 16:06                             ` Mickaël Salaün
2025-07-09 17:31                               ` Song Liu
2025-07-09 22:24                                 ` NeilBrown
2025-07-09 22:50                                   ` Song Liu
2025-07-10  0:58                                     ` NeilBrown
2025-07-10  6:28                                       ` Song Liu
2025-07-14 21:09                                         ` Song Liu
2025-07-24 17:35                                           ` Mickaël Salaün
2025-07-26  9:52                                             ` Song Liu
2025-07-09 22:14                             ` NeilBrown
2025-07-09 22:41                               ` Song Liu
2025-07-10  0:58                                 ` NeilBrown
2025-07-07 10:43               ` Christian Brauner
2025-07-03  5:04     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250707-netto-campieren-501525a7d10a@brauner \
    --to=brauner@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=martin.lau@linux.dev \
    --cc=mattbobrowski@google.com \
    --cc=mic@digikod.net \
    --cc=neil@brown.name \
    --cc=song@kernel.org \
    --cc=songliubraving@meta.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).