linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "NeilBrown" <neil@brown.name>
To: "Song Liu" <songliubraving@meta.com>
Cc: "Tingmao Wang" <m@maowtm.org>, "Mickaël Salaün" <mic@digikod.net>,
	"Song Liu" <song@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-security-module@vger.kernel.org"
	<linux-security-module@vger.kernel.org>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"Kernel Team" <kernel-team@meta.com>,
	"andrii@kernel.org" <andrii@kernel.org>,
	"eddyz87@gmail.com" <eddyz87@gmail.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"martin.lau@linux.dev" <martin.lau@linux.dev>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>,
	"kpsingh@kernel.org" <kpsingh@kernel.org>,
	"mattbobrowski@google.com" <mattbobrowski@google.com>,
	"Günther Noack" <gnoack@google.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
Date: Thu, 26 Jun 2025 20:22:29 +1000	[thread overview]
Message-ID: <175093334910.2280845.2994364473463803565@noble.neil.brown.name> (raw)
In-Reply-To: <9BD19ABC-08B8-4976-912D-DFCC06C29CAA@meta.com>

On Thu, 26 Jun 2025, Song Liu wrote:
> 
> 
> > On Jun 25, 2025, at 6:05 PM, NeilBrown <neil@brown.name> wrote:
> 
> [...]
> 
> >> 
> >> I can't speak for Mickaël, but a callback-based interface is less flexible
> >> (and _maybe_ less performant?).  Also, probably we will want to fallback
> >> to a reference-taking walk if the walk fails (rather than, say, retry
> >> infinitely), and this should probably use Song's proposed iterator.  I'm
> >> not sure if Song would be keen to rewrite this iterator patch series in
> >> callback style (to be clear, it doesn't necessarily seem like a good idea
> >> to me, and I'm not asking him to), which means that we will end up with
> >> the reference walk API being a "call this function repeatedly", and the
> >> rcu walk API taking a callback.  I think it is still workable (after all,
> >> if Landlock wants to reuse the code in the callback it can just call the
> >> callback function itself when doing the reference walk), but it seems a
> >> bit "ugly" to me.
> > 
> > call-back can have a performance impact (less opportunity for compiler
> > optimisation and CPU speculation), though less than taking spinlock and
> > references.  However Al and Christian have drawn a hard line against
> > making seq numbers visible outside VFS code so I think it is the
> > approach most likely to be accepted.
> > 
> > Certainly vfs_walk_ancestors() would fallback to ref-walk if rcu-walk
> > resulted in -ECHILD - just like all other path walking code in namei.c.
> > This would be largely transparent to the caller - the caller would only
> > see that the callback received a NULL path indicating a restart.  It
> > wouldn't need to know why.
> 
> I guess I misunderstood the proposal of vfs_walk_ancestors() 
> initially, so some clarification:
> 
> I think vfs_walk_ancestors() is good for the rcu-walk, and some 
> rcu-then-ref-walk. However, I don’t think it fits all use cases. 
> A reliable step-by-step ref-walk, like this set, works well with 
> BPF, and we want to keep it. 

The distinction between rcu-walk and ref-walk is an internal
implementation detail.  You as a caller shouldn't need to think about
the difference.  You just want to walk.  Note that LOOKUP_RCU is
documented in namei.h as "semi-internal".  The only uses outside of
core-VFS code is in individual filesystem's d_revalidate handler - they
are checking if they are allowed to sleep or not.  You should never
expect to pass LOOKUP_RCU to an VFS API - no other code does.

It might be reasonable for you as a caller to have some control over
whether the call can sleep or not.  LOOKUP_CACHED is a bit like that.
But for dotdot lookup the code will never sleep - so that is not
relevant.

I strongly suggest you stop thinking about rcu-walk vs ref-walk.  Think
about the needs of your code.  If you need a high-performance API, then
ask for a high-performance API, don't assume what form it will take or
what the internal implementation details will be.

I think you already have a clear answer that a step-by-step API will not
be read-only on the dcache (i.e.  it will adjust refcounts) and so will
not be high performance.  If you want high performance, you need to
accept a different style of API.

NeilBrown

  parent reply	other threads:[~2025-06-26 10:22 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17  6:11 [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-18  1:02   ` kernel test robot
2025-06-24 12:18   ` Jan Kara
2025-06-24 17:37     ` Song Liu
2025-06-25 10:30       ` Jan Kara
2025-07-04 17:40   ` Yonghong Song
2025-07-06 23:54     ` Song Liu
2025-07-07 17:53       ` Yonghong Song
2025-06-17  6:11 ` [PATCH v5 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-07-03 18:29   ` Mickaël Salaün
2025-07-03 22:27     ` Song Liu
2025-07-04  9:00       ` Mickaël Salaün
2025-07-06 22:29         ` Song Liu
2025-07-07 10:28         ` Christian Brauner
2025-06-17  6:11 ` [PATCH v5 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-17  6:11 ` [PATCH v5 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-20 21:59 ` [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-24 18:45   ` Mickaël Salaün
2025-06-24 21:38     ` NeilBrown
2025-06-25 13:14       ` Mickaël Salaün
2025-06-25 23:04         ` NeilBrown
2025-06-25 23:17           ` Song Liu
2025-06-26  0:07           ` Tingmao Wang
2025-06-26  1:05             ` NeilBrown
2025-06-26  5:52               ` Song Liu
2025-06-26  9:43                 ` Mickaël Salaün
2025-06-26 14:49                   ` Song Liu
2025-06-26 10:22                 ` NeilBrown [this message]
2025-06-26 14:28                   ` Song Liu
2025-06-26 22:51                     ` NeilBrown
2025-06-27  0:21                       ` Song Liu
2025-07-07 10:46                       ` Christian Brauner
2025-07-07 11:17                         ` Christian Brauner
2025-07-07 18:50                           ` Song Liu
2025-07-09 16:06                             ` Mickaël Salaün
2025-07-09 17:31                               ` Song Liu
2025-07-09 22:24                                 ` NeilBrown
2025-07-09 22:50                                   ` Song Liu
2025-07-10  0:58                                     ` NeilBrown
2025-07-10  6:28                                       ` Song Liu
2025-07-14 21:09                                         ` Song Liu
2025-07-24 17:35                                           ` Mickaël Salaün
2025-07-26  9:52                                             ` Song Liu
2025-07-09 22:14                             ` NeilBrown
2025-07-09 22:41                               ` Song Liu
2025-07-10  0:58                                 ` NeilBrown
2025-07-07 10:43               ` Christian Brauner
2025-07-03  5:04     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=175093334910.2280845.2994364473463803565@noble.neil.brown.name \
    --to=neil@brown.name \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@meta.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=m@maowtm.org \
    --cc=martin.lau@linux.dev \
    --cc=mattbobrowski@google.com \
    --cc=mic@digikod.net \
    --cc=song@kernel.org \
    --cc=songliubraving@meta.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).