* Re: [BUG] KASAN: slab-use-after-free in link_path_walk
[not found] <20260423013916.1589029-1-eulgyukim@snu.ac.kr>
@ 2026-04-23 4:39 ` Al Viro
2026-04-23 5:19 ` Al Viro
0 siblings, 1 reply; 2+ messages in thread
From: Al Viro @ 2026-04-23 4:39 UTC (permalink / raw)
To: Eulgyu Kim
Cc: brauner, jack, linux-fsdevel, linux-kernel, byoungyoung,
jjy600901, Alexei Starovoitov, KaFai Wan, Yonghong Song, bpf
On Thu, Apr 23, 2026 at 10:39:16AM +0900, Eulgyu Kim wrote:
> We suspect there is a race condition between vfs_rmdir() and may_lookup()
> on the BPF pseudo filesystem. It seems that while link_path_walk() is walking
> a path, its call to may_lookup() checks permissions on the current directory
> inode through nd->inode, and vfs_rmdir() can remove that same directory and
> trigger inode destruction, leading to a use-after-free.
Not really. What happens is that bpf does prompt freeing of struct inode, instead
of having it done with RCU delay. Everything else is a result of that.
What's going on there? It used to be in ->free_inode(); who had moved that into
->destroy_inode(), why had that been done, who had ACKed that and how have I
missed the discussions on fsdevel?
<digs>
commit 4f375ade6aa9f37fd72d7a78682f639772089eed
Author: KaFai Wan <kafai.wan@linux.dev>
Date: Wed Oct 8 18:26:26 2025 +0800
bpf: Avoid RCU context warning when unpinning htab with internal structs
[blocking stuff done from RCU-delayed callback, so let's make everything prompt,
whaddya mean, what was the delay for?]
Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
OK, that answers some of that... <looks the posting up>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
yonghong.song@linux.dev, john.fastabend@gmail.com,
kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
jolsa@kernel.org, shuah@kernel.org, kafai.wan@linux.dev,
toke@redhat.com, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org
... right, that probably answers the last one. Incidentally, that commit has
brought back the old bug with cached symlink bodies getting freed without RCU delay.
It is possible that it was discussed on fsdevel at some point and I'd missed it
there, but...
Folks, the rules are simple:
* anything that might be accessed in RCU mode (inode very much included
for objects that are visible in the tree) must be freed after RCU delay; that's
what ->free_inode() is for.
* anything that can't be freed in such context should either be
dealt with in ->destroy_inode() (if it isn't needed for RCU-exposed methods)
or, if it really is needed for those, done via schedule_work() or equivalent
done by ->destroy_inode().
Seeing that bpffs has the grand total of zero RCU-exposed methods (no ->d_compare(),
no ->d_hash(), no ->permission(), no ->d_revalidate(), no ->get_link()) I would
guess that it's the case of "have your bpf_any_put() done promptly, leave freeing
the inode and cached symlink body RCU-delayed". Something like the delta below
(completely untested):
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 25c06a011825..bd052a8e89a9 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -762,14 +762,26 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
return 0;
}
+// this is done promptly
static void bpf_destroy_inode(struct inode *inode)
{
enum bpf_type type;
- if (S_ISLNK(inode->i_mode))
- kfree(inode->i_link);
+ // better done here, since it's blocking and we'd need
+ // to use something like schedule_work() to do it from
+ // ->free_inode(); since this stuff doesn't need to
+ // be delayed, doing it here is less headache.
if (!bpf_inode_type(inode, &type))
bpf_any_put(inode->i_private, type);
+}
+
+// ... and this is done with RCU delay; anything that might be accessed
+// by RCU pathwalk (like, you know, inode and symlink contents) should be
+// dealt with here
+static void bpf_free_inode(struct inode *inode)
+{
+ if (S_ISLNK(inode->i_mode))
+ kfree(inode->i_link);
free_inode_nonrcu(inode);
}
@@ -778,6 +790,7 @@ const struct super_operations bpf_super_ops = {
.drop_inode = inode_just_drop,
.show_options = bpf_show_options,
.destroy_inode = bpf_destroy_inode,
+ .free_inode = bpf_free_inode,
};
enum {
^ permalink raw reply related [flat|nested] 2+ messages in thread