From: Christian Brauner <brauner@kernel.org>
To: Tingmao Wang <m@maowtm.org>
Cc: "Al Viro" <viro@zeniv.linux.org.uk>,
"Mickaël Salaün" <mic@digikod.net>, "Song Liu" <song@kernel.org>,
"Günther Noack" <gnoack@google.com>, "Jan Kara" <jack@suse.cz>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
linux-security-module@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] landlock: walk parent dir without taking references
Date: Fri, 6 Jun 2025 12:25:42 +0200 [thread overview]
Message-ID: <20250606-ekstase-geloben-9d8916729756@brauner> (raw)
In-Reply-To: <5c8476df-56c4-4dd1-b5c8-40cb604eae62@maowtm.org>
On Wed, Jun 04, 2025 at 10:05:01PM +0100, Tingmao Wang wrote:
> On 6/4/25 18:15, Mickaël Salaün wrote:
> > On Wed, Jun 04, 2025 at 01:45:43AM +0100, Tingmao Wang wrote:
> >> [..]
> >> @@ -897,10 +898,14 @@ static bool is_access_to_paths_allowed(
> >> break;
> >> jump_up:
> >> if (walker_path.dentry == walker_path.mnt->mnt_root) {
> >> + /* follow_up gets the parent and puts the passed in path */
> >> + path_get(&walker_path);
> >> if (follow_up(&walker_path)) {
> >> + path_put(&walker_path);
> >
> > path_put() cannot be safely called in a RCU read-side critical section
> > because it can free memory which can sleep, and also because it can wait
> > for a lock. However, we can call rcu_read_unlock() before and
> > rcu_read_lock() after (if we hold a reference).
>
> Thanks for pointing this out.
>
> Actually I think this might be even more tricky. I'm not sure if we can
> always rely on the dentry still being there after rcu_read_unlock(),
> regardless of whether we do a path_get() before unlocking... Even when
> we're inside a RCU read-side critical section, my understanding is that if
> a dentry reaches zero refcount and is selected to be freed (either
> immediately or by LRU) from another CPU, dentry_free will do
> call_rcu(&dentry->d_u.d_rcu, __d_free) which will cause the dentry to
> immediately be freed after our rcu_read_unlock(), regardless of whether we
> had a path_get() before that.
>
> In fact because lockref_mark_dead sets the refcount to negative,
> path_get() would simply be wrong. We could use lockref_get_not_dead()
> instead, and only continue if we actually acquired a reference, but then
> we have the problem of not being able to dput() the dentry acquired by
> follow_up(), without risking it getting killed before we can enter RCU
> again (although I do wonder if it's possible for it to be killed, given
> that there is an active mountpoint on it that we hold a reference for?).
>
> While we could probably do something like "defer the dput() until we next
> reach a mountpoint and can rcu_read_unlock()", or use lockref_put_return()
> and assert that the dentry must still have refcount > 0 since it's an
> in-use mountpoint, after a lot of thinking it seems to me the only clean
> solution is to have a mechanism of walking up mounts completely
> reference-free. Maybe what we actually need is choose_mountpoint_rcu().
>
> That function is private, so I guess a question for Al and other VFS
> people here is, can we potentially expose an equivalent publicly?
> (Perhaps it would only do effectively what __prepend_path in d_path.c
> does, and we can track the mount_lock seqcount outside. Also the fact
> that throughout all this we have a valid reference to the leaf dentry we
> started from, to me should mean that the mount can't disappear under us
> anyway)
I have a very hard time warming up to the idea that you're doing
reference less path walk out of sight of the core VFS. That smells like
a massive landmine. It's barely correct now and it won't get better if
this code bitrots when Al or I massage other codepaths. And that happens
a lot.
We're exposing new infrastructure for the BPF LSM layer that is targeted
also to help Landlock. Use the helper exposed for them. It makes things
easier for all of us.
>
> >
> >> /* Ignores hidden mount points. */
> >> goto jump_up;
> >> } else {
> >> + path_put(&walker_path);
> >> /*
> >> * Stops at the real root. Denies access
> >> * because not all layers have granted access.
> >> @@ -920,11 +925,11 @@ static bool is_access_to_paths_allowed(
> >> }
> >> break;
> >> }
> >> - parent_dentry = dget_parent(walker_path.dentry);
> >> - dput(walker_path.dentry);
> >> + parent_dentry = walker_path.dentry->d_parent;
> >> walker_path.dentry = parent_dentry;
> >> }
> >> - path_put(&walker_path);
> >> +
> >> + rcu_read_unlock();
> >>
> >> if (!allowed_parent1) {
> >> log_request_parent1->type = LANDLOCK_REQUEST_FS_ACCESS;
> >> @@ -1045,12 +1050,11 @@ static bool collect_domain_accesses(
> >> layer_masks_dom,
> >> LANDLOCK_KEY_INODE);
> >>
> >> - dget(dir);
> >> - while (true) {
> >> - struct dentry *parent_dentry;
> >> + rcu_read_lock();
> >>
> >> + while (true) {
> >> /* Gets all layers allowing all domain accesses. */
> >> - if (landlock_unmask_layers(find_rule(domain, dir), access_dom,
> >> + if (landlock_unmask_layers(find_rule_rcu(domain, dir), access_dom,
> >> layer_masks_dom,
> >> ARRAY_SIZE(*layer_masks_dom))) {
> >> /*
> >> @@ -1065,11 +1069,11 @@ static bool collect_domain_accesses(
> >> if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir)))
> >> break;
> >>
> >> - parent_dentry = dget_parent(dir);
> >> - dput(dir);
> >> - dir = parent_dentry;
> >> + dir = dir->d_parent;
> >> }
> >> - dput(dir);
> >> +
> >> + rcu_read_unlock();
> >> +
> >> return ret;
> >> }
> >>
> >> --
> >> 2.49.0
> >>
> >>
next prev parent reply other threads:[~2025-06-06 10:25 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-04 0:45 [RFC PATCH 0/3] landlock: walk parent dir with RCU, without taking references Tingmao Wang
2025-06-04 0:45 ` [RFC PATCH 1/3] landlock: walk parent dir " Tingmao Wang
2025-06-04 17:15 ` Mickaël Salaün
2025-06-04 21:05 ` Tingmao Wang
2025-06-06 10:25 ` Christian Brauner [this message]
2025-06-04 0:45 ` [RFC PATCH 2/3] selftests/landlock: Add fs_race_test Tingmao Wang
2025-06-04 0:45 ` [RFC PATCH 3/3] Restart pathwalk on rename seqcount change Tingmao Wang
2025-06-04 0:55 ` Al Viro
2025-06-04 1:12 ` Tingmao Wang
2025-06-04 2:21 ` Al Viro
2025-06-04 18:56 ` Tingmao Wang
2025-06-04 19:17 ` Tingmao Wang
2025-06-04 1:09 ` Tingmao Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250606-ekstase-geloben-9d8916729756@brauner \
--to=brauner@kernel.org \
--cc=alexei.starovoitov@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=m@maowtm.org \
--cc=mic@digikod.net \
--cc=song@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox