public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] KASAN: slab-use-after-free in link_path_walk
       [not found] <20260423013916.1589029-1-eulgyukim@snu.ac.kr>
@ 2026-04-23  4:39 ` Al Viro
  2026-04-23  5:19   ` Al Viro
  0 siblings, 1 reply; 2+ messages in thread
From: Al Viro @ 2026-04-23  4:39 UTC (permalink / raw)
  To: Eulgyu Kim
  Cc: brauner, jack, linux-fsdevel, linux-kernel, byoungyoung,
	jjy600901, Alexei Starovoitov, KaFai Wan, Yonghong Song, bpf

On Thu, Apr 23, 2026 at 10:39:16AM +0900, Eulgyu Kim wrote:

> We suspect there is a race condition between vfs_rmdir() and may_lookup()
> on the BPF pseudo filesystem. It seems that while link_path_walk() is walking
> a path, its call to may_lookup() checks permissions on the current directory
> inode through nd->inode, and vfs_rmdir() can remove that same directory and
> trigger inode destruction, leading to a use-after-free.

Not really.  What happens is that bpf does prompt freeing of struct inode, instead
of having it done with RCU delay.  Everything else is a result of that.

What's going on there?  It used to be in ->free_inode(); who had moved that into
->destroy_inode(), why had that been done, who had ACKed that and how have I
missed the discussions on fsdevel?

<digs>

commit 4f375ade6aa9f37fd72d7a78682f639772089eed
Author: KaFai Wan <kafai.wan@linux.dev>
Date:   Wed Oct 8 18:26:26 2025 +0800
 
     bpf: Avoid RCU context warning when unpinning htab with internal structs

[blocking stuff done from RCU-delayed callback, so let's make everything prompt,
whaddya mean, what was the delay for?]

Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

OK, that answers some of that...  <looks the posting up>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
	yonghong.song@linux.dev, john.fastabend@gmail.com,
	kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
	jolsa@kernel.org, shuah@kernel.org, kafai.wan@linux.dev,
	toke@redhat.com, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org

... right, that probably answers the last one.  Incidentally, that commit has
brought back the old bug with cached symlink bodies getting freed without RCU delay.
It is possible that it was discussed on fsdevel at some point and I'd missed it
there, but...

Folks, the rules are simple:
	* anything that might be accessed in RCU mode (inode very much included
for objects that are visible in the tree) must be freed after RCU delay; that's
what ->free_inode() is for.
	* anything that can't be freed in such context should either be
dealt with in ->destroy_inode() (if it isn't needed for RCU-exposed methods)
or, if it really is needed for those, done via schedule_work() or equivalent
done by ->destroy_inode().

Seeing that bpffs has the grand total of zero RCU-exposed methods (no ->d_compare(),
no ->d_hash(), no ->permission(), no ->d_revalidate(), no ->get_link()) I would
guess that it's the case of "have your bpf_any_put() done promptly, leave freeing
the inode and cached symlink body RCU-delayed".  Something like the delta below
(completely untested):

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 25c06a011825..bd052a8e89a9 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -762,14 +762,26 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	return 0;
 }
 
+// this is done promptly
 static void bpf_destroy_inode(struct inode *inode)
 {
 	enum bpf_type type;
 
-	if (S_ISLNK(inode->i_mode))
-		kfree(inode->i_link);
+	// better done here, since it's blocking and we'd need
+	// to use something like schedule_work() to do it from
+	// ->free_inode(); since this stuff doesn't need to
+	// be delayed, doing it here is less headache.
 	if (!bpf_inode_type(inode, &type))
 		bpf_any_put(inode->i_private, type);
+}
+
+// ... and this is done with RCU delay; anything that might be accessed
+// by RCU pathwalk (like, you know, inode and symlink contents) should be
+// dealt with here
+static void bpf_free_inode(struct inode *inode)
+{
+	if (S_ISLNK(inode->i_mode))
+		kfree(inode->i_link);
 	free_inode_nonrcu(inode);
 }
 
@@ -778,6 +790,7 @@ const struct super_operations bpf_super_ops = {
 	.drop_inode	= inode_just_drop,
 	.show_options	= bpf_show_options,
 	.destroy_inode	= bpf_destroy_inode,
+	.free_inode	= bpf_free_inode,
 };
 
 enum {

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [BUG] KASAN: slab-use-after-free in link_path_walk
  2026-04-23  4:39 ` [BUG] KASAN: slab-use-after-free in link_path_walk Al Viro
@ 2026-04-23  5:19   ` Al Viro
  0 siblings, 0 replies; 2+ messages in thread
From: Al Viro @ 2026-04-23  5:19 UTC (permalink / raw)
  To: Eulgyu Kim
  Cc: brauner, jack, linux-fsdevel, linux-kernel, byoungyoung,
	jjy600901, Alexei Starovoitov, KaFai Wan, Yonghong Song, bpf

On Thu, Apr 23, 2026 at 05:39:06AM +0100, Al Viro wrote:
> Folks, the rules are simple:
> 	* anything that might be accessed in RCU mode (inode very much included
> for objects that are visible in the tree) must be freed after RCU delay; that's
> what ->free_inode() is for.
> 	* anything that can't be freed in such context should either be
> dealt with in ->destroy_inode() (if it isn't needed for RCU-exposed methods)
> or, if it really is needed for those, done via schedule_work() or equivalent
> done by ->destroy_inode().

If you do ->destroy_inode() alone, you must use an explicit call_rcu() in there
(or in ->evict_inode(), for that matter), with everything that must be RCU-delayed
done via that callback; strongly discouraged, though, since it's easier to leave
that to fs/inode.c by turning that callback into ->free_inode().

> Seeing that bpffs has the grand total of zero RCU-exposed methods (no ->d_compare(),
> no ->d_hash(), no ->permission(), no ->d_revalidate(), no ->get_link()) I would
> guess that it's the case of "have your bpf_any_put() done promptly, leave freeing
> the inode and cached symlink body RCU-delayed".

Other than bpffs there are only two instances of super_operations that have non-NULL
->destroy_inode() and NULL ->free_inode():
static const struct super_operations pipefs_ops = {
        .destroy_inode = free_inode_nonrcu,
        .statfs = simple_statfs,
};
which is fine, since pipefs inodes are not exposed to RCU pathwalk at all and
static const struct super_operations btrfs_test_super_ops = {
        .alloc_inode    = btrfs_alloc_inode,
        .destroy_inode  = btrfs_test_destroy_inode,
};
which is definitely not fine, but since that thing is not exposed to regular
syscalls (only to odd internal selftests, what with not being user-mountable),
presumably it gets away with that.  AFAICS, it may end up calling
	cond_resched_rwlock_write(&tree->lock);
from drop_all_extent_maps_fast(), from btrfs_drop_extent_map_range(), called
in btrfs_test_destroy_inode(), so it probably needs to leave that call
of btrfs_drop_extent_map_range() in ->destroy_inode() and use their
regular btrfs_free_inode() for ->free_inode().

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-23  5:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260423013916.1589029-1-eulgyukim@snu.ac.kr>
2026-04-23  4:39 ` [BUG] KASAN: slab-use-after-free in link_path_walk Al Viro
2026-04-23  5:19   ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox