* Re: [syzbot ci] Re: Close race in freeing special fields and map value
2026-02-26 9:31 ` [syzbot ci] Re: Close race in freeing special fields and map value syzbot ci
@ 2026-02-26 15:05 ` Kumar Kartikeya Dwivedi
2026-02-26 17:36 ` Kumar Kartikeya Dwivedi
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-26 15:05 UTC (permalink / raw)
To: syzbot ci
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
paulmck, yatsenko, syzbot, syzkaller-bugs
On Thu, 26 Feb 2026 at 10:31, syzbot ci
<syzbot+ci3826af8b4f91bf97@syzkaller.appspotmail.com> wrote:
>
> syzbot ci has tested the following series
>
> [v1] Close race in freeing special fields and map value
> https://lore.kernel.org/all/20260225185121.2057388-1-memxor@gmail.com
> * [PATCH bpf-next v1 1/4] bpf: Register dtor for freeing special fields
> * [PATCH bpf-next v1 2/4] bpf: Delay freeing fields in local storage
> * [PATCH bpf-next v1 3/4] bpf: Retire rcu_trace_implies_rcu_gp() from local storage
> * [PATCH bpf-next v1 4/4] selftests/bpf: Add tests for special fields races
>
> and found the following issue:
> KASAN: slab-use-after-free Read in free_all
>
> Full report is available here:
> https://ci.syzbot.org/series/ad9c7ce6-d861-4afe-821a-e8fae6120b12
>
> ***
>
> KASAN: slab-use-after-free Read in free_all
>
> tree: bpf-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
> base: f620af11c27b8ec9994a39fe968aa778112d1566
> arch: amd64
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> config: https://ci.syzbot.org/builds/b2735b0b-963d-41a7-bb33-db10494a24e1/config
> syz repro: https://ci.syzbot.org/findings/3ee61f99-7d80-4986-a0b1-492ffb6b1c46/syz_repro
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in free_all+0x84/0x140 kernel/bpf/memalloc.c:271
> Read of size 8 at addr ffff888112ee9260 by task swapper/0/0
>
> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
> <IRQ>
> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
> print_address_description mm/kasan/report.c:378 [inline]
> print_report+0xba/0x230 mm/kasan/report.c:482
> kasan_report+0x117/0x150 mm/kasan/report.c:595
> free_all+0x84/0x140 kernel/bpf/memalloc.c:271
> do_call_rcu_ttrace+0x385/0x400 kernel/bpf/memalloc.c:315
> __free_by_rcu+0x23b/0x3f0 kernel/bpf/memalloc.c:381
> rcu_do_batch kernel/rcu/tree.c:2617 [inline]
> rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
> handle_softirqs+0x22a/0x870 kernel/softirq.c:622
> __do_softirq kernel/softirq.c:656 [inline]
> invoke_softirq kernel/softirq.c:496 [inline]
> __irq_exit_rcu+0x5f/0x150 kernel/softirq.c:723
> irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
> instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1056 [inline]
> sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1056
> </IRQ>
> <TASK>
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
> RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:63
> Code: 8e 6d 02 c3 cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 43 3d 1c 00 fb f4 <e9> 7c ea 02 00 cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90
> RSP: 0018:ffffffff8e407dc0 EFLAGS: 00000202
> RAX: 00000000000d2297 RBX: ffffffff819a80ad RCX: 0000000080000001
> RDX: 0000000000000001 RSI: ffffffff8def22fa RDI: ffffffff8c27a800
> RBP: ffffffff8e407eb0 R08: ffff88812103395b R09: 1ffff1102420672b
> R10: dffffc0000000000 R11: ffffed102420672c R12: ffffffff90117db0
> R13: 1ffffffff1c929d8 R14: 0000000000000000 R15: 0000000000000000
> arch_safe_halt arch/x86/kernel/process.c:766 [inline]
> default_idle+0x9/0x20 arch/x86/kernel/process.c:767
> default_idle_call+0x72/0xb0 kernel/sched/idle.c:122
> cpuidle_idle_call kernel/sched/idle.c:191 [inline]
> do_idle+0x1bd/0x500 kernel/sched/idle.c:332
> cpu_startup_entry+0x43/0x60 kernel/sched/idle.c:430
> rest_init+0x2de/0x300 init/main.c:760
> start_kernel+0x385/0x3d0 init/main.c:1210
> x86_64_start_reservations+0x24/0x30 arch/x86/kernel/head64.c:310
> x86_64_start_kernel+0x143/0x1c0 arch/x86/kernel/head64.c:291
> common_startup_64+0x13e/0x147
> </TASK>
>
> Allocated by task 5961:
> kasan_save_stack mm/kasan/common.c:57 [inline]
> kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
> __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
> kasan_kmalloc include/linux/kasan.h:263 [inline]
> __do_kmalloc_node mm/slub.c:5219 [inline]
> __kmalloc_node_noprof+0x4e0/0x7c0 mm/slub.c:5225
> kmalloc_node_noprof include/linux/slab.h:1093 [inline]
> __bpf_map_area_alloc kernel/bpf/syscall.c:398 [inline]
> bpf_map_area_alloc+0x64/0x170 kernel/bpf/syscall.c:411
> trie_alloc+0x14f/0x340 kernel/bpf/lpm_trie.c:588
> map_create+0xafd/0x16a0 kernel/bpf/syscall.c:1507
> __sys_bpf+0x6e1/0x950 kernel/bpf/syscall.c:6210
> __do_sys_bpf kernel/bpf/syscall.c:6341 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:6339 [inline]
> __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6339
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Freed by task 83:
> kasan_save_stack mm/kasan/common.c:57 [inline]
> kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
> kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
> poison_slab_object mm/kasan/common.c:253 [inline]
> __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
> kasan_slab_free include/linux/kasan.h:235 [inline]
> slab_free_hook mm/slub.c:2687 [inline]
> slab_free mm/slub.c:6124 [inline]
> kfree+0x1c1/0x630 mm/slub.c:6442
> bpf_map_free kernel/bpf/syscall.c:892 [inline]
> bpf_map_free_deferred+0x217/0x460 kernel/bpf/syscall.c:919
> process_one_work kernel/workqueue.c:3275 [inline]
> process_scheduled_works+0xb02/0x1830 kernel/workqueue.c:3358
> worker_thread+0xa50/0xfc0 kernel/workqueue.c:3439
> kthread+0x388/0x470 kernel/kthread.c:467
> ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> Last potentially related work creation:
> kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
> kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
> insert_work+0x3d/0x330 kernel/workqueue.c:2199
> __queue_work+0xd03/0x1020 kernel/workqueue.c:2354
> queue_work_on+0x106/0x1d0 kernel/workqueue.c:2405
> bpf_map_put_with_uref kernel/bpf/syscall.c:975 [inline]
> bpf_map_release+0x127/0x140 kernel/bpf/syscall.c:985
> __fput+0x44f/0xa70 fs/file_table.c:469
> task_work_run+0x1d9/0x270 kernel/task_work.c:233
> resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
> __exit_to_user_mode_loop kernel/entry/common.c:67 [inline]
> exit_to_user_mode_loop+0xed/0x480 kernel/entry/common.c:98
> __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
> syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
> syscall_exit_to_user_mode include/linux/entry-common.h:325 [inline]
> do_syscall_64+0x32d/0xf80 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> [...]
This is because the 'ma' is embedded in map struct, so we have an
optimization to speed up map destruction without waiting for pending
callbacks' rcu_barrier() by copying it out. Typically no callbacks are
in progress by the time map is gone, so it wasn't seen in tests, but
UAF can happen when we kmemdup() and proceed with freeing the bpf_map,
such that the 'ma' pointer becomes invalid.
Just storing dtor and dtor ctx directly is not ok either, since it
accesses the map struct and the corresponding btf_record to decide how
to free fields. It will cause the same UAF. The right way for me
appears to be to duplicate the btf_record (it's not very big anyway)
from the map and store it as the dtor ctx. To remove dependency on
ma->dtor, just store dtor directly in the bpf_mem_caches along with
dtor ctx. Then the dtor can take care of everything else.
^ permalink raw reply [flat|nested] 8+ messages in thread* [syzbot ci] Re: Close race in freeing special fields and map value
2026-02-26 9:31 ` [syzbot ci] Re: Close race in freeing special fields and map value syzbot ci
2026-02-26 15:05 ` Kumar Kartikeya Dwivedi
@ 2026-02-26 17:36 ` Kumar Kartikeya Dwivedi
2026-02-26 17:38 ` syzbot ci
2026-02-26 18:28 ` Kumar Kartikeya Dwivedi
2026-02-26 18:36 ` [PATCH] please work syzbot Kumar Kartikeya Dwivedi
3 siblings, 1 reply; 8+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-26 17:36 UTC (permalink / raw)
To: syzbot+ci3826af8b4f91bf97
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
#syz test
---
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index 7517eaf94ac9..4ce0d27f8ea2 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -14,7 +14,7 @@ struct bpf_mem_alloc {
struct obj_cgroup *objcg;
bool percpu;
struct work_struct work;
- void (*dtor)(void *obj, void *ctx);
+ void (*dtor_ctx_free)(void *ctx);
void *dtor_ctx;
};
@@ -35,7 +35,9 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg
int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx);
+ void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx),
+ void *ctx);
/* Check the allocation size for kmalloc equivalent allocator */
int bpf_mem_alloc_check_size(bool percpu, size_t size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 74f7a6f44c50..582f0192b7e1 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -125,6 +125,11 @@ struct htab_elem {
char key[] __aligned(8);
};
+struct htab_btf_record {
+ struct btf_record *record;
+ u32 key_size;
+};
+
static inline bool htab_is_prealloc(const struct bpf_htab *htab)
{
return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
@@ -459,28 +464,80 @@ static int htab_map_alloc_check(union bpf_attr *attr)
static void htab_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
+ struct htab_btf_record *hrec = ctx;
struct htab_elem *elem = obj;
void *map_value;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
- map_value = htab_elem_value(elem, htab->map.key_size);
- bpf_obj_free_fields(htab->map.record, map_value);
+ map_value = htab_elem_value(elem, hrec->key_size);
+ bpf_obj_free_fields(hrec->record, map_value);
}
static void htab_pcpu_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
void __percpu *pptr = *(void __percpu **)obj;
+ struct htab_btf_record *hrec = ctx;
int cpu;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
for_each_possible_cpu(cpu)
- bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
+ bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu));
+}
+
+static void htab_dtor_ctx_free(void *ctx)
+{
+ struct htab_btf_record *hrec = ctx;
+
+ btf_record_free(hrec->record);
+ kfree(ctx);
+}
+
+static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void *))
+{
+ u32 key_size = htab->map.key_size;
+ const struct bpf_mem_alloc *ma;
+ struct htab_btf_record *hrec;
+ int err;
+
+ /* No need for dtors. */
+ if (IS_ERR_OR_NULL(htab->map.record))
+ return 0;
+
+ hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
+ if (!hrec)
+ return -ENOMEM;
+ hrec->key_size = key_size;
+ hrec->record = btf_record_dup(htab->map.record);
+ if (IS_ERR(hrec->record)) {
+ err = PTR_ERR(hrec->record);
+ kfree(hrec);
+ return err;
+ }
+ ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
+ /* Kinda sad, but cast away const-ness since we change ma->dtor. */
+ bpf_mem_alloc_set_dtor((struct bpf_mem_alloc *)ma, dtor, htab_dtor_ctx_free, hrec);
+ return 0;
+}
+
+static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+ const struct btf_type *key_type, const struct btf_type *value_type)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ if (htab_is_prealloc(htab))
+ return 0;
+ /*
+ * We must set the dtor using this callback, as map's BTF record is not
+ * populated in htab_map_alloc(), so it will always appear as NULL.
+ */
+ if (htab_is_percpu(htab))
+ return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+ else
+ return htab_set_dtor(htab, htab_mem_dtor);
}
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -595,17 +652,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
round_up(htab->map.value_size, 8), true);
if (err)
goto free_map_locked;
- /* See comment below */
- bpf_mem_alloc_set_dtor(&htab->pcpu_ma, htab_pcpu_mem_dtor, htab);
- } else {
- /*
- * Register the dtor unconditionally. map->record is
- * set by map_create() after map_alloc() returns, so it
- * is always NULL at this point. The dtor checks
- * IS_ERR_OR_NULL(htab->map.record) and becomes a no-op
- * for maps without special fields.
- */
- bpf_mem_alloc_set_dtor(&htab->ma, htab_mem_dtor, htab);
}
}
@@ -2318,6 +2364,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2340,6 +2387,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2519,6 +2567,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_percpu),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2539,6 +2588,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru_percpu),
.map_btf_id = &htab_map_btf_ids[0],
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 137e855c718b..682a9f34214b 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -102,7 +102,8 @@ struct bpf_mem_cache {
int percpu_size;
bool draining;
struct bpf_mem_cache *tgt;
- struct bpf_mem_alloc *ma;
+ void (*dtor)(void *obj, void *ctx);
+ void *dtor_ctx;
/* list of objects to be freed after RCU GP */
struct llist_head free_by_rcu;
@@ -261,15 +262,14 @@ static void free_one(void *obj, bool percpu)
kfree(obj);
}
-static int free_all(struct llist_node *llnode, bool percpu,
- struct bpf_mem_alloc *ma)
+static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu)
{
struct llist_node *pos, *t;
int cnt = 0;
llist_for_each_safe(pos, t, llnode) {
- if (ma->dtor)
- ma->dtor((void *)pos + LLIST_NODE_SZ, ma->dtor_ctx);
+ if (c->dtor)
+ c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx);
free_one(pos, percpu);
cnt++;
}
@@ -280,7 +280,7 @@ static void __free_rcu(struct rcu_head *head)
{
struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
atomic_set(&c->call_rcu_ttrace_in_progress, 0);
}
@@ -312,7 +312,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) {
if (unlikely(READ_ONCE(c->draining))) {
llnode = llist_del_all(&c->free_by_rcu_ttrace);
- free_all(llnode, !!c->percpu_size, c->ma);
+ free_all(c, llnode, !!c->percpu_size);
}
return;
}
@@ -421,7 +421,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c)
dec_active(c, &flags);
if (unlikely(READ_ONCE(c->draining))) {
- free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
atomic_set(&c->call_rcu_in_progress, 0);
} else {
call_rcu_hurry(&c->rcu, __free_by_rcu);
@@ -546,7 +546,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
}
@@ -569,7 +568,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -622,7 +620,6 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -631,7 +628,7 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
return 0;
}
-static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
+static void drain_mem_cache(struct bpf_mem_cache *c)
{
bool percpu = !!c->percpu_size;
@@ -642,13 +639,13 @@ static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
* Except for waiting_for_gp_ttrace list, there are no concurrent operations
* on these lists, so it is safe to use __llist_del_all().
*/
- free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu, ma);
- free_all(__llist_del_all(&c->free_llist), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra), percpu, ma);
- free_all(__llist_del_all(&c->free_by_rcu), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp), percpu, ma);
+ free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu);
+ free_all(c, __llist_del_all(&c->free_llist), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra), percpu);
+ free_all(c, __llist_del_all(&c->free_by_rcu), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp), percpu);
}
static void check_mem_cache(struct bpf_mem_cache *c)
@@ -687,6 +684,9 @@ static void check_leaked_objs(struct bpf_mem_alloc *ma)
static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
{
+ /* We can free dtor ctx only once all callbacks are done using it. */
+ if (ma->dtor_ctx_free)
+ ma->dtor_ctx_free(ma->dtor_ctx);
check_leaked_objs(ma);
free_percpu(ma->cache);
free_percpu(ma->caches);
@@ -758,7 +758,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = per_cpu_ptr(ma->cache, cpu);
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -773,7 +773,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = &cc->cache[i];
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -1022,9 +1022,31 @@ int bpf_mem_alloc_check_size(bool percpu, size_t size)
return 0;
}
-void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx)
+void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx), void *ctx)
{
- ma->dtor = dtor;
+ struct bpf_mem_caches *cc;
+ struct bpf_mem_cache *c;
+ int cpu, i;
+
+ ma->dtor_ctx_free = dtor_ctx_free;
ma->dtor_ctx = ctx;
+
+ if (ma->cache) {
+ for_each_possible_cpu(cpu) {
+ c = per_cpu_ptr(ma->cache, cpu);
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ if (ma->caches) {
+ for_each_possible_cpu(cpu) {
+ cc = per_cpu_ptr(ma->caches, cpu);
+ for (i = 0; i < NUM_CACHES; i++) {
+ c = &cc->cache[i];
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ }
}
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [syzbot ci] Re: Close race in freeing special fields and map value
2026-02-26 17:36 ` Kumar Kartikeya Dwivedi
@ 2026-02-26 17:38 ` syzbot ci
0 siblings, 0 replies; 8+ messages in thread
From: syzbot ci @ 2026-02-26 17:38 UTC (permalink / raw)
To: memxor
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
Unknown command
^ permalink raw reply [flat|nested] 8+ messages in thread
* [syzbot ci] Re: Close race in freeing special fields and map value
2026-02-26 9:31 ` [syzbot ci] Re: Close race in freeing special fields and map value syzbot ci
2026-02-26 15:05 ` Kumar Kartikeya Dwivedi
2026-02-26 17:36 ` Kumar Kartikeya Dwivedi
@ 2026-02-26 18:28 ` Kumar Kartikeya Dwivedi
2026-02-26 18:30 ` syzbot ci
2026-02-26 18:36 ` [PATCH] please work syzbot Kumar Kartikeya Dwivedi
3 siblings, 1 reply; 8+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-26 18:28 UTC (permalink / raw)
To: syzbot+ci3826af8b4f91bf97
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
#syz test:
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index 7517eaf94ac9..4ce0d27f8ea2 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -14,7 +14,7 @@ struct bpf_mem_alloc {
struct obj_cgroup *objcg;
bool percpu;
struct work_struct work;
- void (*dtor)(void *obj, void *ctx);
+ void (*dtor_ctx_free)(void *ctx);
void *dtor_ctx;
};
@@ -35,7 +35,9 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg
int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx);
+ void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx),
+ void *ctx);
/* Check the allocation size for kmalloc equivalent allocator */
int bpf_mem_alloc_check_size(bool percpu, size_t size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 74f7a6f44c50..582f0192b7e1 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -125,6 +125,11 @@ struct htab_elem {
char key[] __aligned(8);
};
+struct htab_btf_record {
+ struct btf_record *record;
+ u32 key_size;
+};
+
static inline bool htab_is_prealloc(const struct bpf_htab *htab)
{
return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
@@ -459,28 +464,80 @@ static int htab_map_alloc_check(union bpf_attr *attr)
static void htab_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
+ struct htab_btf_record *hrec = ctx;
struct htab_elem *elem = obj;
void *map_value;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
- map_value = htab_elem_value(elem, htab->map.key_size);
- bpf_obj_free_fields(htab->map.record, map_value);
+ map_value = htab_elem_value(elem, hrec->key_size);
+ bpf_obj_free_fields(hrec->record, map_value);
}
static void htab_pcpu_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
void __percpu *pptr = *(void __percpu **)obj;
+ struct htab_btf_record *hrec = ctx;
int cpu;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
for_each_possible_cpu(cpu)
- bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
+ bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu));
+}
+
+static void htab_dtor_ctx_free(void *ctx)
+{
+ struct htab_btf_record *hrec = ctx;
+
+ btf_record_free(hrec->record);
+ kfree(ctx);
+}
+
+static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void *))
+{
+ u32 key_size = htab->map.key_size;
+ const struct bpf_mem_alloc *ma;
+ struct htab_btf_record *hrec;
+ int err;
+
+ /* No need for dtors. */
+ if (IS_ERR_OR_NULL(htab->map.record))
+ return 0;
+
+ hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
+ if (!hrec)
+ return -ENOMEM;
+ hrec->key_size = key_size;
+ hrec->record = btf_record_dup(htab->map.record);
+ if (IS_ERR(hrec->record)) {
+ err = PTR_ERR(hrec->record);
+ kfree(hrec);
+ return err;
+ }
+ ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
+ /* Kinda sad, but cast away const-ness since we change ma->dtor. */
+ bpf_mem_alloc_set_dtor((struct bpf_mem_alloc *)ma, dtor, htab_dtor_ctx_free, hrec);
+ return 0;
+}
+
+static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+ const struct btf_type *key_type, const struct btf_type *value_type)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ if (htab_is_prealloc(htab))
+ return 0;
+ /*
+ * We must set the dtor using this callback, as map's BTF record is not
+ * populated in htab_map_alloc(), so it will always appear as NULL.
+ */
+ if (htab_is_percpu(htab))
+ return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+ else
+ return htab_set_dtor(htab, htab_mem_dtor);
}
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -595,17 +652,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
round_up(htab->map.value_size, 8), true);
if (err)
goto free_map_locked;
- /* See comment below */
- bpf_mem_alloc_set_dtor(&htab->pcpu_ma, htab_pcpu_mem_dtor, htab);
- } else {
- /*
- * Register the dtor unconditionally. map->record is
- * set by map_create() after map_alloc() returns, so it
- * is always NULL at this point. The dtor checks
- * IS_ERR_OR_NULL(htab->map.record) and becomes a no-op
- * for maps without special fields.
- */
- bpf_mem_alloc_set_dtor(&htab->ma, htab_mem_dtor, htab);
}
}
@@ -2318,6 +2364,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2340,6 +2387,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2519,6 +2567,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_percpu),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2539,6 +2588,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru_percpu),
.map_btf_id = &htab_map_btf_ids[0],
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 137e855c718b..682a9f34214b 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -102,7 +102,8 @@ struct bpf_mem_cache {
int percpu_size;
bool draining;
struct bpf_mem_cache *tgt;
- struct bpf_mem_alloc *ma;
+ void (*dtor)(void *obj, void *ctx);
+ void *dtor_ctx;
/* list of objects to be freed after RCU GP */
struct llist_head free_by_rcu;
@@ -261,15 +262,14 @@ static void free_one(void *obj, bool percpu)
kfree(obj);
}
-static int free_all(struct llist_node *llnode, bool percpu,
- struct bpf_mem_alloc *ma)
+static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu)
{
struct llist_node *pos, *t;
int cnt = 0;
llist_for_each_safe(pos, t, llnode) {
- if (ma->dtor)
- ma->dtor((void *)pos + LLIST_NODE_SZ, ma->dtor_ctx);
+ if (c->dtor)
+ c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx);
free_one(pos, percpu);
cnt++;
}
@@ -280,7 +280,7 @@ static void __free_rcu(struct rcu_head *head)
{
struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
atomic_set(&c->call_rcu_ttrace_in_progress, 0);
}
@@ -312,7 +312,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) {
if (unlikely(READ_ONCE(c->draining))) {
llnode = llist_del_all(&c->free_by_rcu_ttrace);
- free_all(llnode, !!c->percpu_size, c->ma);
+ free_all(c, llnode, !!c->percpu_size);
}
return;
}
@@ -421,7 +421,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c)
dec_active(c, &flags);
if (unlikely(READ_ONCE(c->draining))) {
- free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
atomic_set(&c->call_rcu_in_progress, 0);
} else {
call_rcu_hurry(&c->rcu, __free_by_rcu);
@@ -546,7 +546,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
}
@@ -569,7 +568,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -622,7 +620,6 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -631,7 +628,7 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
return 0;
}
-static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
+static void drain_mem_cache(struct bpf_mem_cache *c)
{
bool percpu = !!c->percpu_size;
@@ -642,13 +639,13 @@ static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
* Except for waiting_for_gp_ttrace list, there are no concurrent operations
* on these lists, so it is safe to use __llist_del_all().
*/
- free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu, ma);
- free_all(__llist_del_all(&c->free_llist), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra), percpu, ma);
- free_all(__llist_del_all(&c->free_by_rcu), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp), percpu, ma);
+ free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu);
+ free_all(c, __llist_del_all(&c->free_llist), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra), percpu);
+ free_all(c, __llist_del_all(&c->free_by_rcu), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp), percpu);
}
static void check_mem_cache(struct bpf_mem_cache *c)
@@ -687,6 +684,9 @@ static void check_leaked_objs(struct bpf_mem_alloc *ma)
static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
{
+ /* We can free dtor ctx only once all callbacks are done using it. */
+ if (ma->dtor_ctx_free)
+ ma->dtor_ctx_free(ma->dtor_ctx);
check_leaked_objs(ma);
free_percpu(ma->cache);
free_percpu(ma->caches);
@@ -758,7 +758,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = per_cpu_ptr(ma->cache, cpu);
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -773,7 +773,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = &cc->cache[i];
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -1022,9 +1022,31 @@ int bpf_mem_alloc_check_size(bool percpu, size_t size)
return 0;
}
-void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx)
+void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx), void *ctx)
{
- ma->dtor = dtor;
+ struct bpf_mem_caches *cc;
+ struct bpf_mem_cache *c;
+ int cpu, i;
+
+ ma->dtor_ctx_free = dtor_ctx_free;
ma->dtor_ctx = ctx;
+
+ if (ma->cache) {
+ for_each_possible_cpu(cpu) {
+ c = per_cpu_ptr(ma->cache, cpu);
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ if (ma->caches) {
+ for_each_possible_cpu(cpu) {
+ cc = per_cpu_ptr(ma->caches, cpu);
+ for (i = 0; i < NUM_CACHES; i++) {
+ c = &cc->cache[i];
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ }
}
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [syzbot ci] Re: Close race in freeing special fields and map value
2026-02-26 18:28 ` Kumar Kartikeya Dwivedi
@ 2026-02-26 18:30 ` syzbot ci
0 siblings, 0 replies; 8+ messages in thread
From: syzbot ci @ 2026-02-26 18:30 UTC (permalink / raw)
To: memxor
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
Unknown command
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] please work syzbot
2026-02-26 9:31 ` [syzbot ci] Re: Close race in freeing special fields and map value syzbot ci
` (2 preceding siblings ...)
2026-02-26 18:28 ` Kumar Kartikeya Dwivedi
@ 2026-02-26 18:36 ` Kumar Kartikeya Dwivedi
2026-02-26 18:42 ` syzbot ci
3 siblings, 1 reply; 8+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-02-26 18:36 UTC (permalink / raw)
To: syzbot+ci3826af8b4f91bf97
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
Main things to remember:
-dtor is set fro m map_check_btf because record cannot be inspected in
alloc.
-set dtor for each cache individually.
-free ctx after callbacks are done running.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
#syz test
include/linux/bpf_mem_alloc.h | 6 ++-
kernel/bpf/hashtab.c | 86 +++++++++++++++++++++++++++--------
kernel/bpf/memalloc.c | 70 ++++++++++++++++++----------
3 files changed, 118 insertions(+), 44 deletions(-)
diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index 7517eaf94ac9..4ce0d27f8ea2 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -14,7 +14,7 @@ struct bpf_mem_alloc {
struct obj_cgroup *objcg;
bool percpu;
struct work_struct work;
- void (*dtor)(void *obj, void *ctx);
+ void (*dtor_ctx_free)(void *ctx);
void *dtor_ctx;
};
@@ -35,7 +35,9 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg
int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx);
+ void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx),
+ void *ctx);
/* Check the allocation size for kmalloc equivalent allocator */
int bpf_mem_alloc_check_size(bool percpu, size_t size);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 74f7a6f44c50..582f0192b7e1 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -125,6 +125,11 @@ struct htab_elem {
char key[] __aligned(8);
};
+struct htab_btf_record {
+ struct btf_record *record;
+ u32 key_size;
+};
+
static inline bool htab_is_prealloc(const struct bpf_htab *htab)
{
return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
@@ -459,28 +464,80 @@ static int htab_map_alloc_check(union bpf_attr *attr)
static void htab_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
+ struct htab_btf_record *hrec = ctx;
struct htab_elem *elem = obj;
void *map_value;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
- map_value = htab_elem_value(elem, htab->map.key_size);
- bpf_obj_free_fields(htab->map.record, map_value);
+ map_value = htab_elem_value(elem, hrec->key_size);
+ bpf_obj_free_fields(hrec->record, map_value);
}
static void htab_pcpu_mem_dtor(void *obj, void *ctx)
{
- struct bpf_htab *htab = ctx;
void __percpu *pptr = *(void __percpu **)obj;
+ struct htab_btf_record *hrec = ctx;
int cpu;
- if (IS_ERR_OR_NULL(htab->map.record))
+ if (IS_ERR_OR_NULL(hrec->record))
return;
for_each_possible_cpu(cpu)
- bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
+ bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu));
+}
+
+static void htab_dtor_ctx_free(void *ctx)
+{
+ struct htab_btf_record *hrec = ctx;
+
+ btf_record_free(hrec->record);
+ kfree(ctx);
+}
+
+static int htab_set_dtor(const struct bpf_htab *htab, void (*dtor)(void *, void *))
+{
+ u32 key_size = htab->map.key_size;
+ const struct bpf_mem_alloc *ma;
+ struct htab_btf_record *hrec;
+ int err;
+
+ /* No need for dtors. */
+ if (IS_ERR_OR_NULL(htab->map.record))
+ return 0;
+
+ hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
+ if (!hrec)
+ return -ENOMEM;
+ hrec->key_size = key_size;
+ hrec->record = btf_record_dup(htab->map.record);
+ if (IS_ERR(hrec->record)) {
+ err = PTR_ERR(hrec->record);
+ kfree(hrec);
+ return err;
+ }
+ ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
+ /* Kinda sad, but cast away const-ness since we change ma->dtor. */
+ bpf_mem_alloc_set_dtor((struct bpf_mem_alloc *)ma, dtor, htab_dtor_ctx_free, hrec);
+ return 0;
+}
+
+static int htab_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+ const struct btf_type *key_type, const struct btf_type *value_type)
+{
+ struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+
+ if (htab_is_prealloc(htab))
+ return 0;
+ /*
+ * We must set the dtor using this callback, as map's BTF record is not
+ * populated in htab_map_alloc(), so it will always appear as NULL.
+ */
+ if (htab_is_percpu(htab))
+ return htab_set_dtor(htab, htab_pcpu_mem_dtor);
+ else
+ return htab_set_dtor(htab, htab_mem_dtor);
}
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
@@ -595,17 +652,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
round_up(htab->map.value_size, 8), true);
if (err)
goto free_map_locked;
- /* See comment below */
- bpf_mem_alloc_set_dtor(&htab->pcpu_ma, htab_pcpu_mem_dtor, htab);
- } else {
- /*
- * Register the dtor unconditionally. map->record is
- * set by map_create() after map_alloc() returns, so it
- * is always NULL at this point. The dtor checks
- * IS_ERR_OR_NULL(htab->map.record) and becomes a no-op
- * for maps without special fields.
- */
- bpf_mem_alloc_set_dtor(&htab->ma, htab_mem_dtor, htab);
}
}
@@ -2318,6 +2364,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2340,6 +2387,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2519,6 +2567,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_percpu),
.map_btf_id = &htab_map_btf_ids[0],
@@ -2539,6 +2588,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
+ .map_check_btf = htab_map_check_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru_percpu),
.map_btf_id = &htab_map_btf_ids[0],
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 137e855c718b..682a9f34214b 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -102,7 +102,8 @@ struct bpf_mem_cache {
int percpu_size;
bool draining;
struct bpf_mem_cache *tgt;
- struct bpf_mem_alloc *ma;
+ void (*dtor)(void *obj, void *ctx);
+ void *dtor_ctx;
/* list of objects to be freed after RCU GP */
struct llist_head free_by_rcu;
@@ -261,15 +262,14 @@ static void free_one(void *obj, bool percpu)
kfree(obj);
}
-static int free_all(struct llist_node *llnode, bool percpu,
- struct bpf_mem_alloc *ma)
+static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu)
{
struct llist_node *pos, *t;
int cnt = 0;
llist_for_each_safe(pos, t, llnode) {
- if (ma->dtor)
- ma->dtor((void *)pos + LLIST_NODE_SZ, ma->dtor_ctx);
+ if (c->dtor)
+ c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx);
free_one(pos, percpu);
cnt++;
}
@@ -280,7 +280,7 @@ static void __free_rcu(struct rcu_head *head)
{
struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size);
atomic_set(&c->call_rcu_ttrace_in_progress, 0);
}
@@ -312,7 +312,7 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) {
if (unlikely(READ_ONCE(c->draining))) {
llnode = llist_del_all(&c->free_by_rcu_ttrace);
- free_all(llnode, !!c->percpu_size, c->ma);
+ free_all(c, llnode, !!c->percpu_size);
}
return;
}
@@ -421,7 +421,7 @@ static void check_free_by_rcu(struct bpf_mem_cache *c)
dec_active(c, &flags);
if (unlikely(READ_ONCE(c->draining))) {
- free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size, c->ma);
+ free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size);
atomic_set(&c->call_rcu_in_progress, 0);
} else {
call_rcu_hurry(&c->rcu, __free_by_rcu);
@@ -546,7 +546,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
}
@@ -569,7 +568,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -622,7 +620,6 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
c->objcg = objcg;
c->percpu_size = percpu_size;
c->tgt = c;
- c->ma = ma;
init_refill_work(c);
prefill_mem_cache(c, cpu);
@@ -631,7 +628,7 @@ int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size)
return 0;
}
-static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
+static void drain_mem_cache(struct bpf_mem_cache *c)
{
bool percpu = !!c->percpu_size;
@@ -642,13 +639,13 @@ static void drain_mem_cache(struct bpf_mem_cache *c, struct bpf_mem_alloc *ma)
* Except for waiting_for_gp_ttrace list, there are no concurrent operations
* on these lists, so it is safe to use __llist_del_all().
*/
- free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu, ma);
- free_all(__llist_del_all(&c->free_llist), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra), percpu, ma);
- free_all(__llist_del_all(&c->free_by_rcu), percpu, ma);
- free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu, ma);
- free_all(llist_del_all(&c->waiting_for_gp), percpu, ma);
+ free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu);
+ free_all(c, __llist_del_all(&c->free_llist), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra), percpu);
+ free_all(c, __llist_del_all(&c->free_by_rcu), percpu);
+ free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu);
+ free_all(c, llist_del_all(&c->waiting_for_gp), percpu);
}
static void check_mem_cache(struct bpf_mem_cache *c)
@@ -687,6 +684,9 @@ static void check_leaked_objs(struct bpf_mem_alloc *ma)
static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
{
+ /* We can free dtor ctx only once all callbacks are done using it. */
+ if (ma->dtor_ctx_free)
+ ma->dtor_ctx_free(ma->dtor_ctx);
check_leaked_objs(ma);
free_percpu(ma->cache);
free_percpu(ma->caches);
@@ -758,7 +758,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = per_cpu_ptr(ma->cache, cpu);
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -773,7 +773,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
c = &cc->cache[i];
WRITE_ONCE(c->draining, true);
irq_work_sync(&c->refill_work);
- drain_mem_cache(c, ma);
+ drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_ttrace_in_progress);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
}
@@ -1022,9 +1022,31 @@ int bpf_mem_alloc_check_size(bool percpu, size_t size)
return 0;
}
-void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
- void (*dtor)(void *obj, void *ctx), void *ctx)
+void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx),
+ void (*dtor_ctx_free)(void *ctx), void *ctx)
{
- ma->dtor = dtor;
+ struct bpf_mem_caches *cc;
+ struct bpf_mem_cache *c;
+ int cpu, i;
+
+ ma->dtor_ctx_free = dtor_ctx_free;
ma->dtor_ctx = ctx;
+
+ if (ma->cache) {
+ for_each_possible_cpu(cpu) {
+ c = per_cpu_ptr(ma->cache, cpu);
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ if (ma->caches) {
+ for_each_possible_cpu(cpu) {
+ cc = per_cpu_ptr(ma->caches, cpu);
+ for (i = 0; i < NUM_CACHES; i++) {
+ c = &cc->cache[i];
+ c->dtor = dtor;
+ c->dtor_ctx = ctx;
+ }
+ }
+ }
}
--
2.47.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] please work syzbot
2026-02-26 18:36 ` [PATCH] please work syzbot Kumar Kartikeya Dwivedi
@ 2026-02-26 18:42 ` syzbot ci
0 siblings, 0 replies; 8+ messages in thread
From: syzbot ci @ 2026-02-26 18:42 UTC (permalink / raw)
To: memxor
Cc: andrii, ast, bpf, daniel, eddyz87, kernel-team, kkd, martin.lau,
memxor, paulmck, syzbot, syzkaller-bugs, yatsenko
Unknown command
^ permalink raw reply [flat|nested] 8+ messages in thread