From: "Mickaël Salaün" <mic@digikod.net>
To: Song Liu <songliubraving@meta.com>
Cc: NeilBrown <neil@brown.name>,
"Christian Brauner" <brauner@kernel.org>,
"Tingmao Wang" <m@maowtm.org>, "Song Liu" <song@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-security-module@vger.kernel.org"
<linux-security-module@vger.kernel.org>,
"Kernel Team" <kernel-team@meta.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"jack@suse.cz" <jack@suse.cz>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"mattbobrowski@google.com" <mattbobrowski@google.com>,
"Günther Noack" <gnoack@google.com>,
"Jann Horn" <jannh@google.com>
Subject: Re: [PATCH v5 bpf-next 0/5] bpf path iterator
Date: Thu, 24 Jul 2025 19:35:54 +0200 [thread overview]
Message-ID: <20250724.ij7AhF9quoow@digikod.net> (raw)
In-Reply-To: <2243B959-AA11-4D24-A6D0-0598E244BE3E@meta.com>
On Mon, Jul 14, 2025 at 09:09:42PM +0000, Song Liu wrote:
>
> > On Jul 9, 2025, at 11:28 PM, Song Liu <songliubraving@meta.com> wrote:
>
> [...]
>
> >>>> It isn't clear to me that vfs_walk_ancestors() needs to return anything.
> >>>> All the communication happens through walk_cb()
> >>>>
> >>>> walk_cb() is called with a path, the data, and a "may_sleep" flag.
> >>>> If it needs to sleep but may_sleep is not set, it returns "-ECHILD"
> >>>> which causes the walk to restart and use refcounts.
> >>>> If it wants to stop, it returns 0.
> >>>> If it wants to continue, it returns 1.
> >>>> If it wants a reference to the path then it can use (new)
> >>>> vfs_legitimize_path() which might fail.
> >>>> If it wants a reference to the path and may_sleep is true, it can use
> >>>> path_get() which won't fail.
> >>>>
> >>>> When returning -ECHILD (either because of a need to sleep or because
> >>>> vfs_legitimize_path() fails), walk_cb() would reset_data().
> >>>
> >>> This might actually work.
> >>>
> >>> My only concern is with vfs_legitimize_path. It is probably safer if
> >>> we only allow taking references with may_sleep==true, so that path_get
> >>> won’t fail. In this case, we will not need walk_cb() to call
> >>> vfs_legitimize_path. If the user want a reference, the walk_cb will
> >>> first return -ECHILD, and call path_get when may_sleep is true.
> >>
> >> What is your concern with vfs_legitimize_path() ??
> >>
> >> I've since realised that always restarting in response to -ECHILD isn't
> >> necessary and isn't how normal path-walk works. Restarting might be
> >> needed, but the first response to -ECHILD is to try legitimize_path().
> >> If that succeeds, then it is safe to sleep.
> >> So returning -ECHILD might just result in vfs_walk_ancestors() calling
> >> legitimize_path() and then calling walk_cb() again. Why not have
> >> walk_cb() do the vfs_legitimize_path() call (which will almost always
> >> succeed in practice).
> >
> > After reading the emails and the code more, I think I misunderstood
> > why we need to call vfs_legitimize_path(). The goal of “legitimize”
> > is to get a reference on @path, so a reference-less walk may not
> > need legitimize_path() at all. Do I get this right this time?
> >
> > However, I still have some concern with legitimize_path: it requires
> > m_seq and r_seq recorded at the beginning of the walk, do we want
> > to pass those to walk_cb()? IIUC, one of the reason we prefer a
> > callback based solution is that it doesn’t expose nameidata (or a
> > subset of it). Letting walk_cb to call legitimize_path appears to
> > defeat this benefit, no?
Yes, walk_cb() should be very light and non-blocking/non-sleepable. If
the caller cannot give these guarantees, then it can just pass NULL
instead of a valid walk_cb(), and continue the walk (if needed) by
calling the vfs_walk_ancentors() helper again, which would not benefit
from the RCU optimization in this case.
Before this patch series land, handling of disconnected directories
should be well defined, or at least let the caller deal with it. How do
you plan to handle disconnected directories for the eBPF use case? See
https://lore.kernel.org/all/20250719104204.545188-1-mic@digikod.net/
Unfortunately, this issue is not solved for Landlock yet.
> >
> >
> > A separate question below.
> >
> > I still have some question about how vfs_walk_ancestors() and the
> > walk_cb() interact. Let’s look at the landlock use case: the user
> > (landlock) just want to look at each ancestor, but doesn’t need to
> > take any references. walk_cb() will check @path against @root, and
> > return 0 when @path is the same as @root.
> >
> > IIUC, in this case, we will record m_seq and r_seq at the beginning
> > of vfs_walk_ancestors(), and check them against mount_lock and
> > rename_lock at the end of the walk. (Maybe we also need to check
> > them at some points before the end of the walk?) If either seq
> > changed during the walk, we need to restart the walk, and take
> > reference on each step. Did I get this right so far?
I think so. You should get some inspiration from prepend_path().
> >
> > If the above is right, here are my questions about the
> > reference-less walk above:
> >
> > 1. Which function (vfs_walk_ancestors or walk_cb) will check m_seq
> > and r_seq? I think vfs_walk_ancestors should check them.
Yes, walk_cb() should be as simple as possible: the simpler version
should just return a constant.
> > 2. When either seq changes, which function will call reset_data?
> > I think there are 3 options here:
> > 2.a: vfs_walk_ancestors calls reset_data, which will be another
> > callback function the caller passes to vfs_walk_ancestors.
> > 2.b: walk_cb will call reset_data(), but we need a mechanism to
> > tell walk_cb to do it, maybe a “restart” flag?
> > 2.c: Caller of vfs_walk_ancestors will call reset_data(). In
> > this case, vfs_walk_ancestors will return -ECHILD to its
> > caller. But I think this option is NACKed.
> >
> > I think the right solution is to have vfs_walk_ancestors check
> > m_seq and r_seq, and have walk_cb call reset_data. But this is
> > Different to the proposal above.
I'm not sure a reset_data() would be useful if walk_cb() never sleep.
If we really need such reset_data(), a fourth option would be for
walk_cb() to return a specific value (an enum instead of a bool) to
trigger the reset.
> >
> > Do my questions above make any sense? Or maybe I totally
> > misunderstood something?
>
> Hi Neil,
>
> Did my questions/comments above make sense? I am hoping we can
> agree on some design soon.
>
> Christian and Mickaël,
>
> Could you please also share your thoughts on this?
>
> Current requirements from BPF side is straightforward: we just
> need a mechanism to “walk up one level and hold reference”. So
> most of the requirement comes from LandLock side.
Have you thought about how to handle disconnected directories?
>
> Thanks,
> Song
>
next prev parent reply other threads:[~2025-07-24 17:44 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-17 6:11 [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-17 6:11 ` [PATCH v5 bpf-next 1/5] namei: Introduce new helper function path_walk_parent() Song Liu
2025-06-18 1:02 ` kernel test robot
2025-06-24 12:18 ` Jan Kara
2025-06-24 17:37 ` Song Liu
2025-06-25 10:30 ` Jan Kara
2025-07-04 17:40 ` Yonghong Song
2025-07-06 23:54 ` Song Liu
2025-07-07 17:53 ` Yonghong Song
2025-06-17 6:11 ` [PATCH v5 bpf-next 2/5] landlock: Use path_walk_parent() Song Liu
2025-07-03 18:29 ` Mickaël Salaün
2025-07-03 22:27 ` Song Liu
2025-07-04 9:00 ` Mickaël Salaün
2025-07-06 22:29 ` Song Liu
2025-07-07 10:28 ` Christian Brauner
2025-06-17 6:11 ` [PATCH v5 bpf-next 3/5] bpf: Introduce path iterator Song Liu
2025-06-17 6:11 ` [PATCH v5 bpf-next 4/5] selftests/bpf: Add tests for bpf " Song Liu
2025-06-17 6:11 ` [PATCH v5 bpf-next 5/5] selftests/bpf: Path walk test Song Liu
2025-06-20 21:59 ` [PATCH v5 bpf-next 0/5] bpf path iterator Song Liu
2025-06-24 18:45 ` Mickaël Salaün
2025-06-24 21:38 ` NeilBrown
2025-06-25 13:14 ` Mickaël Salaün
2025-06-25 23:04 ` NeilBrown
2025-06-25 23:17 ` Song Liu
2025-06-26 0:07 ` Tingmao Wang
2025-06-26 1:05 ` NeilBrown
2025-06-26 5:52 ` Song Liu
2025-06-26 9:43 ` Mickaël Salaün
2025-06-26 14:49 ` Song Liu
2025-06-26 10:22 ` NeilBrown
2025-06-26 14:28 ` Song Liu
2025-06-26 22:51 ` NeilBrown
2025-06-27 0:21 ` Song Liu
2025-07-07 10:46 ` Christian Brauner
2025-07-07 11:17 ` Christian Brauner
2025-07-07 18:50 ` Song Liu
2025-07-09 16:06 ` Mickaël Salaün
2025-07-09 17:31 ` Song Liu
2025-07-09 22:24 ` NeilBrown
2025-07-09 22:50 ` Song Liu
2025-07-10 0:58 ` NeilBrown
2025-07-10 6:28 ` Song Liu
2025-07-14 21:09 ` Song Liu
2025-07-24 17:35 ` Mickaël Salaün [this message]
2025-07-26 9:52 ` Song Liu
2025-07-09 22:14 ` NeilBrown
2025-07-09 22:41 ` Song Liu
2025-07-10 0:58 ` NeilBrown
2025-07-07 10:43 ` Christian Brauner
2025-07-03 5:04 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250724.ij7AhF9quoow@digikod.net \
--to=mic@digikod.net \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=m@maowtm.org \
--cc=martin.lau@linux.dev \
--cc=mattbobrowski@google.com \
--cc=neil@brown.name \
--cc=song@kernel.org \
--cc=songliubraving@meta.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).