* Re: [BUG] KASAN: slab-use-after-free in link_path_walk
2026-04-23 4:39 ` [BUG] KASAN: slab-use-after-free in link_path_walk Al Viro
@ 2026-04-23 5:19 ` Al Viro
2026-04-24 1:06 ` Alexei Starovoitov
1 sibling, 0 replies; 3+ messages in thread
From: Al Viro @ 2026-04-23 5:19 UTC (permalink / raw)
To: Eulgyu Kim
Cc: brauner, jack, linux-fsdevel, linux-kernel, byoungyoung,
jjy600901, Alexei Starovoitov, KaFai Wan, Yonghong Song, bpf
On Thu, Apr 23, 2026 at 05:39:06AM +0100, Al Viro wrote:
> Folks, the rules are simple:
> * anything that might be accessed in RCU mode (inode very much included
> for objects that are visible in the tree) must be freed after RCU delay; that's
> what ->free_inode() is for.
> * anything that can't be freed in such context should either be
> dealt with in ->destroy_inode() (if it isn't needed for RCU-exposed methods)
> or, if it really is needed for those, done via schedule_work() or equivalent
> done by ->destroy_inode().
If you do ->destroy_inode() alone, you must use an explicit call_rcu() in there
(or in ->evict_inode(), for that matter), with everything that must be RCU-delayed
done via that callback; strongly discouraged, though, since it's easier to leave
that to fs/inode.c by turning that callback into ->free_inode().
> Seeing that bpffs has the grand total of zero RCU-exposed methods (no ->d_compare(),
> no ->d_hash(), no ->permission(), no ->d_revalidate(), no ->get_link()) I would
> guess that it's the case of "have your bpf_any_put() done promptly, leave freeing
> the inode and cached symlink body RCU-delayed".
Other than bpffs there are only two instances of super_operations that have non-NULL
->destroy_inode() and NULL ->free_inode():
static const struct super_operations pipefs_ops = {
.destroy_inode = free_inode_nonrcu,
.statfs = simple_statfs,
};
which is fine, since pipefs inodes are not exposed to RCU pathwalk at all and
static const struct super_operations btrfs_test_super_ops = {
.alloc_inode = btrfs_alloc_inode,
.destroy_inode = btrfs_test_destroy_inode,
};
which is definitely not fine, but since that thing is not exposed to regular
syscalls (only to odd internal selftests, what with not being user-mountable),
presumably it gets away with that. AFAICS, it may end up calling
cond_resched_rwlock_write(&tree->lock);
from drop_all_extent_maps_fast(), from btrfs_drop_extent_map_range(), called
in btrfs_test_destroy_inode(), so it probably needs to leave that call
of btrfs_drop_extent_map_range() in ->destroy_inode() and use their
regular btrfs_free_inode() for ->free_inode().
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [BUG] KASAN: slab-use-after-free in link_path_walk
2026-04-23 4:39 ` [BUG] KASAN: slab-use-after-free in link_path_walk Al Viro
2026-04-23 5:19 ` Al Viro
@ 2026-04-24 1:06 ` Alexei Starovoitov
1 sibling, 0 replies; 3+ messages in thread
From: Alexei Starovoitov @ 2026-04-24 1:06 UTC (permalink / raw)
To: Al Viro, Eulgyu Kim
Cc: brauner, jack, linux-fsdevel, linux-kernel, byoungyoung,
jjy600901, Alexei Starovoitov, KaFai Wan, Yonghong Song, bpf,
Al Viro
On Wed Apr 22, 2026 at 9:39 PM PDT, Al Viro wrote:
> On Thu, Apr 23, 2026 at 10:39:16AM +0900, Eulgyu Kim wrote:
>
>> We suspect there is a race condition between vfs_rmdir() and may_lookup()
>> on the BPF pseudo filesystem. It seems that while link_path_walk() is walking
>> a path, its call to may_lookup() checks permissions on the current directory
>> inode through nd->inode, and vfs_rmdir() can remove that same directory and
>> trigger inode destruction, leading to a use-after-free.
>
> Not really. What happens is that bpf does prompt freeing of struct inode, instead
> of having it done with RCU delay. Everything else is a result of that.
>
> What's going on there? It used to be in ->free_inode(); who had moved that into
> ->destroy_inode(), why had that been done, who had ACKed that and how have I
> missed the discussions on fsdevel?
>
> <digs>
>
> commit 4f375ade6aa9f37fd72d7a78682f639772089eed
> Author: KaFai Wan <kafai.wan@linux.dev>
> Date: Wed Oct 8 18:26:26 2025 +0800
>
> bpf: Avoid RCU context warning when unpinning htab with internal structs
>
> [blocking stuff done from RCU-delayed callback, so let's make everything prompt,
> whaddya mean, what was the delay for?]
>
> Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
> Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
> Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.")
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>
> OK, that answers some of that... <looks the posting up>
> To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
> martin.lau@linux.dev, eddyz87@gmail.com, song@kernel.org,
> yonghong.song@linux.dev, john.fastabend@gmail.com,
> kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
> jolsa@kernel.org, shuah@kernel.org, kafai.wan@linux.dev,
> toke@redhat.com, linux-kernel@vger.kernel.org,
> bpf@vger.kernel.org, linux-kselftest@vger.kernel.org
>
> ... right, that probably answers the last one. Incidentally, that commit has
> brought back the old bug with cached symlink bodies getting freed without RCU delay.
> It is possible that it was discussed on fsdevel at some point and I'd missed it
> there, but...
>
> Folks, the rules are simple:
> * anything that might be accessed in RCU mode (inode very much included
> for objects that are visible in the tree) must be freed after RCU delay; that's
> what ->free_inode() is for.
> * anything that can't be freed in such context should either be
> dealt with in ->destroy_inode() (if it isn't needed for RCU-exposed methods)
> or, if it really is needed for those, done via schedule_work() or equivalent
> done by ->destroy_inode().
>
> Seeing that bpffs has the grand total of zero RCU-exposed methods (no ->d_compare(),
> no ->d_hash(), no ->permission(), no ->d_revalidate(), no ->get_link()) I would
> guess that it's the case of "have your bpf_any_put() done promptly, leave freeing
> the inode and cached symlink body RCU-delayed". Something like the delta below
> (completely untested):
>
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 25c06a011825..bd052a8e89a9 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -762,14 +762,26 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
> return 0;
> }
>
> +// this is done promptly
> static void bpf_destroy_inode(struct inode *inode)
> {
> enum bpf_type type;
>
> - if (S_ISLNK(inode->i_mode))
> - kfree(inode->i_link);
> + // better done here, since it's blocking and we'd need
> + // to use something like schedule_work() to do it from
> + // ->free_inode(); since this stuff doesn't need to
> + // be delayed, doing it here is less headache.
> if (!bpf_inode_type(inode, &type))
> bpf_any_put(inode->i_private, type);
> +}
> +
> +// ... and this is done with RCU delay; anything that might be accessed
> +// by RCU pathwalk (like, you know, inode and symlink contents) should be
> +// dealt with here
> +static void bpf_free_inode(struct inode *inode)
> +{
> + if (S_ISLNK(inode->i_mode))
> + kfree(inode->i_link);
> free_inode_nonrcu(inode);
> }
>
> @@ -778,6 +790,7 @@ const struct super_operations bpf_super_ops = {
> .drop_inode = inode_just_drop,
> .show_options = bpf_show_options,
> .destroy_inode = bpf_destroy_inode,
> + .free_inode = bpf_free_inode,
> };
Thanks for the fix.
Feel free to take it into your tree directly.
^ permalink raw reply [flat|nested] 3+ messages in thread