[PATCH bpf-next v3 0/4] Fix kptr dtor deadlock

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock
@ 2026-06-09 20:25 Kumar Kartikeya Dwivedi
  2026-06-09 20:25 ` [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs Kumar Kartikeya Dwivedi
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-09 20:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Emil Tsalapatis, Justin Suess, kkd, kernel-team

Referenced kptr destruction can run from tracing/NMI contexts through
bpf_obj_drop() and map value update/delete paths, reaching NMI-unsafe
special field teardown and deadlocks. Justin reported the issue and
iterated on fixes in [0]-[2], and also confirmed the bpf_obj_drop()
reproducer in [3].

This series rejects unsafe obj drops from non-iterator tracing programs,
limits map value recycle to NMI-safe field cancellation, and adds
focused selftests for the obj_drop(), NMI delete, and recycle teardown
cases.

See patches for details.

  [0]: https://lore.kernel.org/bpf/20260505150851.3090688-1-utilityemal77@gmail.com
  [1]: https://lore.kernel.org/bpf/20260507175453.1140400-1-utilityemal77@gmail.com
  [2]: https://lore.kernel.org/bpf/20260519011450.1144935-1-utilityemal77@gmail.com
  [3]: https://lore.kernel.org/bpf/agyG3eQwgmoJwmj2@suesslenovo

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20260609093719.2858096-1-memxor@gmail.com

 * Replace bpf_obj_cancel_fields() to use bpf_map_free_internal_structs(). (Mykyta)
 * Fix CI failures.

v1 -> v2
v1: https://lore.kernel.org/bpf/20260608144841.1732406-1-memxor@gmail.com

 * Drop is_tracing_prog_type() fix due to compat breakage, revisit separately.
 * Rework bpf_obj_drop() fix to additionally reject non-iter tracing progs.

Justin Suess (2):
  bpf: Reject bpf_obj_drop() from tracing progs
  bpf: Cancel special fields on map value recycle

Kumar Kartikeya Dwivedi (2):
  selftests/bpf: Exercise unsafe obj drops from tracing progs
  selftests/bpf: Exercise kptr map update lifetime

 include/linux/bpf.h                           | 30 +++++++
 kernel/bpf/arraymap.c                         |  8 +-
 kernel/bpf/hashtab.c                          | 32 ++++---
 kernel/bpf/syscall.c                          |  5 ++
 kernel/bpf/verifier.c                         | 17 ++++
 .../selftests/bpf/prog_tests/htab_update.c    |  4 +-
 .../selftests/bpf/prog_tests/linked_list.c    | 33 +++----
 .../selftests/bpf/prog_tests/map_kptr.c       | 66 ++++++++++++--
 .../bpf/prog_tests/refcounted_kptr.c          |  8 +-
 .../selftests/bpf/prog_tests/task_kfunc.c     | 42 ++++++++-
 .../testing/selftests/bpf/progs/htab_update.c |  4 +-
 .../testing/selftests/bpf/progs/linked_list.c | 71 +++++++++++++++
 tools/testing/selftests/bpf/progs/map_kptr.c  | 89 ++++++++++++++++++-
 .../selftests/bpf/progs/refcounted_kptr.c     | 20 ++++-
 .../selftests/bpf/progs/task_kfunc_failure.c  | 40 +++++++++
 .../selftests/bpf/progs/task_kfunc_success.c  | 13 ++-
 16 files changed, 421 insertions(+), 61 deletions(-)


base-commit: c15261b1bba0bb7921552cdd86c8b0202697a8f9
-- 
2.53.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs
  2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
@ 2026-06-09 20:25 ` Kumar Kartikeya Dwivedi
  2026-06-09 20:58   ` bot+bpf-ci
  2026-06-09 20:25 ` [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-09 20:25 UTC (permalink / raw)
  To: bpf
  Cc: Justin Suess, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Emil Tsalapatis, kkd,
	kernel-team

From: Justin Suess <utilityemal77@gmail.com>

bpf_obj_drop() runs bpf_obj_free_fields() synchronously for
program-allocated objects. When such an object contains NMI unsafe
fields, tracing programs that can run from arbitrary instrumented
context can reach that destruction from unsafe contexts, including NMI.

NMI is likely one instance of this problem, and other instances would
include possible unsafe reentrancy. Deferring bpf_obj_drop() is not
appealing either: it would add delayed-free machinery to a release
operation that otherwise has straightforward synchronous ownership
semantics.

Reject bpf_obj_drop() and bpf_percpu_obj_drop() from tracing programs
that may run from unsafe contexts unless every field in the object's BTF
record is explicitly NMI safe. Do not reject sleepable
BPF_PROG_TYPE_TRACING programs, since they are not the arbitrary/NMI
contexts that motivate the restriction.

Note that while bpf_rb_root and bpf_list_head would be NMI safe on their
own to free, the objects recursively held by them may not be; be
conservative and just mark them as not NMI safe for now.

Use a whitelist for the NMI-safe field set instead of listing only known
NMI unsafe fields. Locks, async fields, unreferenced kptrs, and
refcounts are known to be NMI safe because their destruction is either a
no-op, simple state reset, or async cancellation. Referenced kptrs,
percpu referenced kptrs, uptrs, graph roots, graph nodes, and any future
field type are rejected until audited for arbitrary tracing and NMI
contexts. This is less susceptible to future changes in fields that were
previously safe by exclusion, and to new fields being added without
updating this check.

Convert the existing recursive local-object drop success case to a
syscall program in the same commit, since this verifier change makes the
old tracing program form invalid. The test still exercises
bpf_obj_drop() releasing a referenced task kptr from a safe program
type.

Fixes: ac9f06050a35 ("bpf: Introduce bpf_obj_drop")
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           | 29 +++++++++++++
 kernel/bpf/verifier.c                         | 17 ++++++++
 .../selftests/bpf/prog_tests/task_kfunc.c     | 42 ++++++++++++++++++-
 .../selftests/bpf/progs/task_kfunc_success.c  | 13 +++---
 4 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 62bba7a4876f..0654d2ffadc1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -492,6 +492,35 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
 	return rec->field_mask & type;
 }
 
+static inline bool btf_field_is_nmi_safe(enum btf_field_type type)
+{
+	switch (type) {
+	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
+	case BPF_TIMER:
+	case BPF_WORKQUEUE:
+	case BPF_TASK_WORK:
+	case BPF_KPTR_UNREF:
+	case BPF_REFCOUNT:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline bool btf_record_has_nmi_unsafe_fields(const struct btf_record *rec)
+{
+	int i;
+
+	if (IS_ERR_OR_NULL(rec))
+		return false;
+	for (i = 0; i < rec->cnt; i++) {
+		if (!btf_field_is_nmi_safe(rec->fields[i].type))
+			return true;
+	}
+	return false;
+}
+
 static inline void bpf_obj_init(const struct btf_record *rec, void *obj)
 {
 	int i;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 954b85609f32..eb46a81a8c51 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -205,6 +205,7 @@ static int release_reference_nomark(struct bpf_verifier_state *state, int id);
 static int release_reference(struct bpf_verifier_env *env, int id);
 static void invalidate_non_owning_refs(struct bpf_verifier_env *env);
 static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env);
+static bool is_tracing_prog_type(enum bpf_prog_type type);
 static int ref_set_non_owning(struct bpf_verifier_env *env,
 			      struct bpf_reg_state *reg);
 static bool is_trusted_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg);
@@ -12881,6 +12882,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			    int *insn_idx_p)
 {
 	bool sleepable, rcu_lock, rcu_unlock, preempt_disable, preempt_enable;
+	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
 	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
 	const struct btf_type *t, *ptr_type;
@@ -12957,6 +12959,21 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	if (err < 0)
 		return err;
 
+	if ((is_bpf_obj_drop_kfunc(meta.func_id) ||
+	     is_bpf_percpu_obj_drop_kfunc(meta.func_id)) && (is_tracing_prog_type(prog_type) ||
+	     /* is_tracing_prog_type() for now doesn't cover non-iterator tracing progs. */
+	     (prog_type == BPF_PROG_TYPE_TRACING && env->prog->expected_attach_type != BPF_TRACE_ITER
+	      && !env->prog->sleepable))) {
+		struct btf_struct_meta *struct_meta;
+
+		struct_meta = btf_find_struct_meta(meta.arg_btf, meta.arg_btf_id);
+		if (struct_meta && btf_record_has_nmi_unsafe_fields(struct_meta->record)) {
+			verbose(env, "%s cannot be used in tracing programs on types with NMI unsafe fields\n",
+				func_name);
+			return -EINVAL;
+		}
+	}
+
 	if (is_bpf_rbtree_add_kfunc(meta.func_id)) {
 		err = push_callback_call(env, insn, insn_idx, meta.subprogno,
 					 set_rbtree_add_callback_state);
diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
index 83b90335967a..e6e95c1416e6 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
@@ -68,6 +68,36 @@ static void run_success_test(const char *prog_name)
 	task_kfunc_success__destroy(skel);
 }
 
+static void run_syscall_success_test(const char *prog_name)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct task_kfunc_success *skel;
+	struct bpf_program *prog;
+	int err;
+
+	skel = open_load_task_kfunc_skel();
+	if (!ASSERT_OK_PTR(skel, "open_load_skel"))
+		return;
+
+	if (!ASSERT_OK(skel->bss->err, "pre_run_err"))
+		goto cleanup;
+
+	prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+	if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+		goto cleanup;
+
+	err = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+	if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+		goto cleanup;
+	if (!ASSERT_EQ(opts.retval, 0, "retval"))
+		goto cleanup;
+
+	ASSERT_OK(skel->bss->err, "post_run_err");
+
+cleanup:
+	task_kfunc_success__destroy(skel);
+}
+
 static int run_vpid_test(void *prog_name)
 {
 	struct task_kfunc_success *skel;
@@ -140,7 +170,6 @@ static const char * const success_tests[] = {
 	"test_task_acquire_release_argument",
 	"test_task_acquire_release_current",
 	"test_task_acquire_leave_in_map",
-	"test_task_xchg_release",
 	"test_task_map_acquire_release",
 	"test_task_current_acquire_release",
 	"test_task_from_pid_arg",
@@ -151,6 +180,10 @@ static const char * const success_tests[] = {
 	"test_task_kfunc_flavor_relo_not_found",
 };
 
+static const char * const syscall_success_tests[] = {
+	"test_task_xchg_release",
+};
+
 static const char * const vpid_success_tests[] = {
 	"test_task_from_vpid_current",
 	"test_task_from_vpid_invalid",
@@ -167,6 +200,13 @@ void test_task_kfunc(void)
 		run_success_test(success_tests[i]);
 	}
 
+	for (i = 0; i < ARRAY_SIZE(syscall_success_tests); i++) {
+		if (!test__start_subtest(syscall_success_tests[i]))
+			continue;
+
+		run_syscall_success_test(syscall_success_tests[i]);
+	}
+
 	for (i = 0; i < ARRAY_SIZE(vpid_success_tests); i++) {
 		if (!test__start_subtest(vpid_success_tests[i]))
 			continue;
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
index 5fb4fc19d26a..d63a79ee33dc 100644
--- a/tools/testing/selftests/bpf/progs/task_kfunc_success.c
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
@@ -140,17 +140,17 @@ int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone
 	return 0;
 }
 
-SEC("tp_btf/task_newtask")
-int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
+SEC("syscall")
+int test_task_xchg_release(const void *ctx)
 {
-	struct task_struct *kptr, *acquired;
+	struct task_struct *task, *kptr, *acquired;
 	struct __tasks_kfunc_map_value *v, *local;
 	int refcnt, refcnt_after_drop;
 	long status;
 
-	if (!is_test_kfunc_task())
-		return 0;
+	(void)ctx;
 
+	task = bpf_get_current_task_btf();
 	status = tasks_kfunc_map_insert(task);
 	if (status) {
 		err = 1;
@@ -191,7 +191,7 @@ int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
 		return 0;
 	}
 
-	/* Stash a copy into local kptr and check if it is released recursively */
+	/* Stash a copy into local kptr and check if it is released recursively. */
 	acquired = bpf_task_acquire(kptr);
 	if (!acquired) {
 		err = 7;
@@ -220,7 +220,6 @@ int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
 	}
 
 	bpf_task_release(kptr);
-
 	return 0;
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs
  2026-06-09 20:25 ` [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs Kumar Kartikeya Dwivedi
@ 2026-06-09 20:58   ` bot+bpf-ci
  0 siblings, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-06-09 20:58 UTC (permalink / raw)
  To: memxor, bpf
  Cc: utilityemal77, ast, andrii, daniel, eddyz87, emil, kkd,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 62bba7a4876f..0654d2ffadc1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -492,6 +492,35 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
>  	return rec->field_mask & type;
>  }
>
> +static inline bool btf_field_is_nmi_safe(enum btf_field_type type)
> +{
> +	switch (type) {
> +	case BPF_SPIN_LOCK:
> +	case BPF_RES_SPIN_LOCK:
> +	case BPF_TIMER:
> +	case BPF_WORKQUEUE:
> +	case BPF_TASK_WORK:
> +	case BPF_KPTR_UNREF:
> +	case BPF_REFCOUNT:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

Is it safe to include BPF_TIMER and BPF_WORKQUEUE in this NMI-safe
whitelist? Looking at the cleanup paths, dropping an object with these
fields calls bpf_timer_cancel_and_free() or bpf_wq_cancel_and_free(),
which synchronously invoke bpf_prog_put() via
bpf_async_update_prog_callback():

kernel/bpf/helpers.c:bpf_async_update_prog_callback()
    bpf_prog_put()
        __bpf_prog_put()
            schedule_work()

If this drops the last reference to the BPF program while in NMI context,
__bpf_prog_put() falls back to schedule_work() since irqs are disabled.
Since schedule_work() takes a raw spin lock on the workqueue pool, couldn't
this hard deadlock if the NMI interrupted another thread that was already
holding the workqueue pool lock?

BPF_TASK_WORK safely defers via irq_work_queue(), but these fields do not
appear to have the same protection.

This concern was raised by sashiko-bot in v1:
https://lore.kernel.org/bpf/20260608154008.27E011F00893@smtp.kernel.org/

Would it be safer to exclude BPF_TIMER and BPF_WORKQUEUE from the
NMI-safe whitelist?

> +
> +static inline bool btf_record_has_nmi_unsafe_fields(const struct btf_record *rec)
> +{
> +	int i;
> +
> +	if (IS_ERR_OR_NULL(rec))
> +		return false;
> +	for (i = 0; i < rec->cnt; i++) {
> +		if (!btf_field_is_nmi_safe(rec->fields[i].type))
> +			return true;
> +	}
> +	return false;
> +}

This isn't a bug, but these helper functions were suggested to be moved
from the header file into verifier.c since they are only used there:
https://lore.kernel.org/bpf/6461e40c-d5a9-41a5-aaba-bf063de67d3f@gmail.com/

Is there a reason to keep them in the header?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle
  2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
  2026-06-09 20:25 ` [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs Kumar Kartikeya Dwivedi
@ 2026-06-09 20:25 ` Kumar Kartikeya Dwivedi
  2026-06-09 20:44   ` sashiko-bot
  2026-06-09 21:09   ` bot+bpf-ci
  2026-06-09 20:25 ` [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-09 20:25 UTC (permalink / raw)
  To: bpf
  Cc: Justin Suess, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Eduard Zingerman, Emil Tsalapatis, kkd,
	kernel-team

From: Justin Suess <utilityemal77@gmail.com>

Map update and delete paths currently call bpf_obj_free_fields() when a
value is being replaced or recycled. That makes field destruction depend
on the context of the update/delete operation. For tracing programs this
can include NMI context, where referenced kptr destructors, uptr
unpinning, and graph root destruction are not generally safe.

Introduce bpf_obj_cancel_fields() for the reusable-value path. It only
performs NMI-safe cleanup for timer, workqueue, and task_work fields.
Fields that need full destruction are left attached to the recycled value
and are destroyed by the final cleanup path instead.

Switch array and hashtab update/delete/recycle paths to this cancel
helper. Keep bpf_obj_free_fields() for final map destruction and for
bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator
destructors, so teardown continues to walk the normal and extra elements
and fully destroy their fields.

This deliberately relaxes the eager-free semantics of map update/delete
for special fields. Programs that relied on a recycled map slot becoming
empty immediately after update/delete were relying on behavior that
cannot be implemented safely from every BPF execution context without
offloading arbitrary destructors.

There is a chance this change breaks programs making assumptions
regarding the eager freeing of fields. If so, we can relax semantics to
cancellation only when irqs_disabled() is true in the future. However,
theoretically, map values that get reused eagerly already have weaker
guarantees as parallel users can recreate freed fields before the new
element becomes visible again.

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  1 +
 kernel/bpf/arraymap.c                         |  8 +--
 kernel/bpf/hashtab.c                          | 32 +++++----
 kernel/bpf/syscall.c                          |  5 ++
 .../selftests/bpf/prog_tests/htab_update.c    |  4 +-
 .../selftests/bpf/prog_tests/linked_list.c    | 33 ++++-----
 .../selftests/bpf/prog_tests/map_kptr.c       | 10 +--
 .../bpf/prog_tests/refcounted_kptr.c          |  8 ++-
 .../testing/selftests/bpf/progs/htab_update.c |  4 +-
 .../testing/selftests/bpf/progs/linked_list.c | 71 +++++++++++++++++++
 .../selftests/bpf/progs/refcounted_kptr.c     | 20 +++++-
 11 files changed, 146 insertions(+), 50 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0654d2ffadc1..56f5da2b437f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2717,6 +2717,7 @@ bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *r
 void bpf_obj_free_timer(const struct btf_record *rec, void *obj);
 void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj);
 void bpf_obj_free_task_work(const struct btf_record *rec, void *obj);
+void bpf_obj_cancel_fields(struct bpf_map *map, void *obj);
 void bpf_obj_free_fields(const struct btf_record *rec, void *obj);
 void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index e6271a2bf6d6..248b4818178c 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -384,7 +384,7 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		val = this_cpu_ptr(array->pptrs[index & array->index_mask]);
 		copy_map_value(map, val, value);
-		bpf_obj_free_fields(array->map.record, val);
+		bpf_obj_cancel_fields(map, val);
 	} else {
 		val = array->value +
 			(u64)array->elem_size * (index & array->index_mask);
@@ -392,7 +392,7 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
 			copy_map_value_locked(map, val, value, false);
 		else
 			copy_map_value(map, val, value);
-		bpf_obj_free_fields(array->map.record, val);
+		bpf_obj_cancel_fields(map, val);
 	}
 	return 0;
 }
@@ -432,14 +432,14 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 		cpu = map_flags >> 32;
 		ptr = per_cpu_ptr(pptr, cpu);
 		copy_map_value(map, ptr, value);
-		bpf_obj_free_fields(array->map.record, ptr);
+		bpf_obj_cancel_fields(map, ptr);
 		goto unlock;
 	}
 	for_each_possible_cpu(cpu) {
 		ptr = per_cpu_ptr(pptr, cpu);
 		val = (map_flags & BPF_F_ALL_CPUS) ? value : value + size * cpu;
 		copy_map_value(map, ptr, val);
-		bpf_obj_free_fields(array->map.record, ptr);
+		bpf_obj_cancel_fields(map, ptr);
 	}
 unlock:
 	rcu_read_unlock();
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b4366cad3cfa..9f394e1aa2e8 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -243,6 +243,10 @@ static void htab_free_prealloced_fields(struct bpf_htab *htab)
 
 	if (IS_ERR_OR_NULL(htab->map.record))
 		return;
+	/*
+	 * Preallocated maps do not have a bpf_mem_alloc destructor, so fully
+	 * destroy every element, including the extra elements.
+	 */
 	if (htab_has_extra_elems(htab))
 		num_entries += num_possible_cpus();
 	for (i = 0; i < num_entries; i++) {
@@ -833,8 +837,8 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
 	return insn - insn_buf;
 }
 
-static void check_and_free_fields(struct bpf_htab *htab,
-				  struct htab_elem *elem)
+static void check_and_cancel_fields(struct bpf_htab *htab,
+				    struct htab_elem *elem)
 {
 	if (IS_ERR_OR_NULL(htab->map.record))
 		return;
@@ -844,11 +848,11 @@ static void check_and_free_fields(struct bpf_htab *htab,
 		int cpu;
 
 		for_each_possible_cpu(cpu)
-			bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
+			bpf_obj_cancel_fields(&htab->map, per_cpu_ptr(pptr, cpu));
 	} else {
 		void *map_value = htab_elem_value(elem, htab->map.key_size);
 
-		bpf_obj_free_fields(htab->map.record, map_value);
+		bpf_obj_cancel_fields(&htab->map, map_value);
 	}
 }
 
@@ -883,7 +887,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	htab_unlock_bucket(b, flags);
 
 	if (l == tgt_l)
-		check_and_free_fields(htab, l);
+		check_and_cancel_fields(htab, l);
 	return l == tgt_l;
 }
 
@@ -948,7 +952,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 
 static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
 {
-	check_and_free_fields(htab, l);
+	check_and_cancel_fields(htab, l);
 
 	if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
 		bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
@@ -1001,7 +1005,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 
 	if (htab_is_prealloc(htab)) {
 		bpf_map_dec_elem_count(&htab->map);
-		check_and_free_fields(htab, l);
+		check_and_cancel_fields(htab, l);
 		pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		dec_elem_count(htab);
@@ -1018,7 +1022,7 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
 		/* copy true value_size bytes */
 		ptr = this_cpu_ptr(pptr);
 		copy_map_value(&htab->map, ptr, value);
-		bpf_obj_free_fields(htab->map.record, ptr);
+		bpf_obj_cancel_fields(&htab->map, ptr);
 	} else {
 		u32 size = round_up(htab->map.value_size, 8);
 		void *val;
@@ -1028,7 +1032,7 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
 			cpu = map_flags >> 32;
 			ptr = per_cpu_ptr(pptr, cpu);
 			copy_map_value(&htab->map, ptr, value);
-			bpf_obj_free_fields(htab->map.record, ptr);
+			bpf_obj_cancel_fields(&htab->map, ptr);
 			return;
 		}
 
@@ -1036,7 +1040,7 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
 			ptr = per_cpu_ptr(pptr, cpu);
 			val = (map_flags & BPF_F_ALL_CPUS) ? value : value + size * cpu;
 			copy_map_value(&htab->map, ptr, val);
-			bpf_obj_free_fields(htab->map.record, ptr);
+			bpf_obj_cancel_fields(&htab->map, ptr);
 		}
 	}
 }
@@ -1252,11 +1256,11 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (l_old) {
 		hlist_nulls_del_rcu(&l_old->hash_node);
 
-		/* l_old has already been stashed in htab->extra_elems, free
-		 * its special fields before it is available for reuse.
+		/* l_old has already been stashed in htab->extra_elems, cancel
+		 * its reusable special fields before it is available for reuse.
 		 */
 		if (htab_is_prealloc(htab))
-			check_and_free_fields(htab, l_old);
+			check_and_cancel_fields(htab, l_old);
 	}
 	htab_unlock_bucket(b, flags);
 	if (l_old && !htab_is_prealloc(htab))
@@ -1269,7 +1273,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 
 static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
 {
-	check_and_free_fields(htab, elem);
+	check_and_cancel_fields(htab, elem);
 	bpf_map_dec_elem_count(&htab->map);
 	bpf_lru_push_free(&htab->lru, &elem->lru_node);
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d4188a992bd8..7ed949f70f82 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -808,6 +808,11 @@ void bpf_obj_free_task_work(const struct btf_record *rec, void *obj)
 	bpf_task_work_cancel_and_free(obj + rec->task_work_off);
 }
 
+void bpf_obj_cancel_fields(struct bpf_map *map, void *obj)
+{
+	bpf_map_free_internal_structs(map, obj);
+}
+
 void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 {
 	const struct btf_field *fields;
diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
index ea1a6766fbe9..0a28d4346924 100644
--- a/tools/testing/selftests/bpf/prog_tests/htab_update.c
+++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
@@ -23,7 +23,7 @@ static void test_reenter_update(void)
 	if (!ASSERT_OK_PTR(skel, "htab_update__open"))
 		return;
 
-	bpf_program__set_autoload(skel->progs.bpf_obj_free_fields, true);
+	bpf_program__set_autoload(skel->progs.bpf_obj_cancel_fields, true);
 	err = htab_update__load(skel);
 	if (!ASSERT_TRUE(!err, "htab_update__load") || err)
 		goto out;
@@ -50,7 +50,7 @@ static void test_reenter_update(void)
 	/*
 	 * Second update: replace existing element with same key and trigger
 	 * the reentrancy of bpf_map_update_elem().
-	 * check_and_free_fields() calls bpf_obj_free_fields() on the old
+	 * check_and_cancel_fields() calls bpf_obj_cancel_fields() on the old
 	 * value, which is where fentry program runs and performs a nested
 	 * bpf_map_update_elem(), triggering -EDEADLK.
 	 */
diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index dbff099860ba..8defea0253ed 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -131,13 +131,14 @@ static void test_linked_list_fail_prog(const char *prog_name, const char *err_ms
 	linked_list_fail__destroy(skel);
 }
 
-static void clear_fields(struct bpf_map *map)
+static void clear_fields(struct bpf_program *prog)
 {
-	char buf[24];
-	int key = 0;
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	int ret;
 
-	memset(buf, 0xff, sizeof(buf));
-	ASSERT_OK(bpf_map__update_elem(map, &key, sizeof(key), buf, sizeof(buf), 0), "check_and_free_fields");
+	ret = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+	ASSERT_OK(ret, "clear_fields");
+	ASSERT_OK(opts.retval, "clear_fields retval");
 }
 
 enum {
@@ -170,31 +171,31 @@ static void test_linked_list_success(int mode, bool leave_in_map)
 	ASSERT_OK(ret, "map_list_push_pop");
 	ASSERT_OK(opts.retval, "map_list_push_pop retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.array_map);
+		clear_fields(skel->progs.clear_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop), &opts);
 	ASSERT_OK(ret, "inner_map_list_push_pop");
 	ASSERT_OK(opts.retval, "inner_map_list_push_pop retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.inner_map);
+		clear_fields(skel->progs.clear_inner_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop), &opts);
 	ASSERT_OK(ret, "global_list_push_pop");
 	ASSERT_OK(opts.retval, "global_list_push_pop retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.bss_A);
+		clear_fields(skel->progs.clear_global_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_nested), &opts);
 	ASSERT_OK(ret, "global_list_push_pop_nested");
 	ASSERT_OK(opts.retval, "global_list_push_pop_nested retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.bss_A);
+		clear_fields(skel->progs.clear_global_nested_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_array_push_pop), &opts);
 	ASSERT_OK(ret, "global_list_array_push_pop");
 	ASSERT_OK(opts.retval, "global_list_array_push_pop retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.bss_A);
+		clear_fields(skel->progs.clear_global_array_list);
 
 	if (mode == PUSH_POP)
 		goto end;
@@ -204,19 +205,19 @@ static void test_linked_list_success(int mode, bool leave_in_map)
 	ASSERT_OK(ret, "map_list_push_pop_multiple");
 	ASSERT_OK(opts.retval, "map_list_push_pop_multiple retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.array_map);
+		clear_fields(skel->progs.clear_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_push_pop_multiple), &opts);
 	ASSERT_OK(ret, "inner_map_list_push_pop_multiple");
 	ASSERT_OK(opts.retval, "inner_map_list_push_pop_multiple retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.inner_map);
+		clear_fields(skel->progs.clear_inner_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_push_pop_multiple), &opts);
 	ASSERT_OK(ret, "global_list_push_pop_multiple");
 	ASSERT_OK(opts.retval, "global_list_push_pop_multiple retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.bss_A);
+		clear_fields(skel->progs.clear_global_list);
 
 	if (mode == PUSH_POP_MULT)
 		goto end;
@@ -226,19 +227,19 @@ static void test_linked_list_success(int mode, bool leave_in_map)
 	ASSERT_OK(ret, "map_list_in_list");
 	ASSERT_OK(opts.retval, "map_list_in_list retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.array_map);
+		clear_fields(skel->progs.clear_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.inner_map_list_in_list), &opts);
 	ASSERT_OK(ret, "inner_map_list_in_list");
 	ASSERT_OK(opts.retval, "inner_map_list_in_list retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.inner_map);
+		clear_fields(skel->progs.clear_inner_map_list);
 
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_in_list), &opts);
 	ASSERT_OK(ret, "global_list_in_list");
 	ASSERT_OK(opts.retval, "global_list_in_list retval");
 	if (!leave_in_map)
-		clear_fields(skel->maps.bss_A);
+		clear_fields(skel->progs.clear_global_list);
 end:
 	linked_list__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
index 03b46f17cf53..ec6f2f2e8308 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -51,7 +51,6 @@ static void test_map_kptr_success(bool test_run)
 	ret = bpf_map__update_elem(skel->maps.array_map,
 				   &key, sizeof(key), buf, sizeof(buf), 0);
 	ASSERT_OK(ret, "array_map update");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
@@ -59,49 +58,42 @@ static void test_map_kptr_success(bool test_run)
 	ret = bpf_map__update_elem(skel->maps.pcpu_array_map,
 				   &key, sizeof(key), pbuf, cpu * sizeof(buf), 0);
 	ASSERT_OK(ret, "pcpu_array_map update");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.hash_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "hash_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.pcpu_hash_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "pcpu_hash_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.hash_malloc_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "hash_malloc_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.pcpu_hash_malloc_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "pcpu_hash_malloc_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.lru_hash_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "lru_hash_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
 
 	ret = bpf_map__delete_elem(skel->maps.lru_pcpu_hash_map, &key, sizeof(key), 0);
 	ASSERT_OK(ret, "lru_pcpu_hash_map delete");
-	skel->data->ref--;
 	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_map_kptr_ref3), &opts);
 	ASSERT_OK(ret, "test_map_kptr_ref3 refcount");
 	ASSERT_OK(opts.retval, "test_map_kptr_ref3 retval");
@@ -175,7 +167,7 @@ void serial_test_map_kptr(void)
 		ASSERT_OK(kern_sync_rcu(), "sync rcu");
 		wait_for_map_release();
 
-		/* Observe refcount dropping to 1 on synchronous delete elem */
+		/* Observe refcount dropping to 1 on map release. */
 		test_map_kptr_success(true);
 	}
 
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index d2c0542716a8..1737eba34323 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -57,6 +57,7 @@ void test_percpu_hash_refcounted_kptr_refcount_leak(void)
 		    .data_size_in = sizeof(pkt_v4),
 		    .repeat = 1,
 	);
+	LIBBPF_OPTS(bpf_test_run_opts, syscall_opts);
 
 	cpu_nr = libbpf_num_possible_cpus();
 	if (!ASSERT_GT(cpu_nr, 0, "libbpf_num_possible_cpus"))
@@ -87,8 +88,11 @@ void test_percpu_hash_refcounted_kptr_refcount_leak(void)
 	if (!ASSERT_EQ(opts.retval, 2, "opts.retval"))
 		goto out;
 
-	err = bpf_map__update_elem(map, &key, sizeof(key), values, values_sz, 0);
-	if (!ASSERT_OK(err, "bpf_map__update_elem"))
+	fd = bpf_program__fd(skel->progs.clear_percpu_hash_kptr);
+	err = bpf_prog_test_run_opts(fd, &syscall_opts);
+	if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+		goto out;
+	if (!ASSERT_EQ(syscall_opts.retval, 1, "syscall_opts.retval"))
 		goto out;
 
 	fd = bpf_program__fd(skel->progs.check_percpu_hash_refcount);
diff --git a/tools/testing/selftests/bpf/progs/htab_update.c b/tools/testing/selftests/bpf/progs/htab_update.c
index 195d3b2fba00..62c1b1325ec2 100644
--- a/tools/testing/selftests/bpf/progs/htab_update.c
+++ b/tools/testing/selftests/bpf/progs/htab_update.c
@@ -22,8 +22,8 @@ struct {
 int pid = 0;
 int update_err = 0;
 
-SEC("?fentry/bpf_obj_free_fields")
-int bpf_obj_free_fields(void *ctx)
+SEC("?fentry/bpf_obj_cancel_fields")
+int bpf_obj_cancel_fields(void *ctx)
 {
 	__u32 key = 0;
 	struct val value = { .payload = 1 };
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
index 421f40835acd..fa97faa5358b 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -290,6 +290,77 @@ int test_list_in_list(struct bpf_spin_lock *lock, struct bpf_list_head *head)
 	return list_in_list(lock, head, true);
 }
 
+#define MAX_LIST_CLEAR_NODES 256
+
+static __always_inline
+int clear_list(struct bpf_spin_lock *lock, struct bpf_list_head *head)
+{
+	struct bpf_list_node *n;
+	int i;
+
+	for (i = 0; i < MAX_LIST_CLEAR_NODES; i++) {
+		bpf_spin_lock(lock);
+		n = bpf_list_pop_front(head);
+		bpf_spin_unlock(lock);
+		if (!n)
+			return 0;
+		bpf_obj_drop(container_of(n, struct foo, node2));
+	}
+	return 1;
+}
+
+SEC("syscall")
+int clear_map_list(void *ctx)
+{
+	struct map_value *v;
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 1;
+	return clear_list(&v->lock, &v->head);
+}
+
+SEC("syscall")
+int clear_inner_map_list(void *ctx)
+{
+	struct map_value *v;
+	void *map;
+
+	map = bpf_map_lookup_elem(&map_of_maps, &(int){0});
+	if (!map)
+		return 1;
+	v = bpf_map_lookup_elem(map, &(int){0});
+	if (!v)
+		return 1;
+	return clear_list(&v->lock, &v->head);
+}
+
+SEC("syscall")
+int clear_global_list(void *ctx)
+{
+	return clear_list(&glock, &ghead);
+}
+
+SEC("syscall")
+int clear_global_nested_list(void *ctx)
+{
+	return clear_list(&ghead_nested.inner.lock, &ghead_nested.inner.head);
+}
+
+SEC("syscall")
+int clear_global_array_list(void *ctx)
+{
+	int ret;
+
+	ret = clear_list(&glock_c, &ghead_array[0]);
+	if (ret)
+		return ret;
+	ret = clear_list(&glock_c, &ghead_array[1]);
+	if (ret)
+		return ret;
+	return clear_list(&glock_c, &ghead_array_one[0]);
+}
+
 SEC("tc")
 int map_list_push_pop(void *ctx)
 {
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index 13de169ad68f..61906f48025c 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -1036,13 +1036,31 @@ int percpu_hash_refcount_leak(void *ctx)
 	struct map_value *v;
 	int key = 0;
 
-	v = bpf_map_lookup_elem(&percpu_hash, &key);
+	v = bpf_map_lookup_percpu_elem(&percpu_hash, &key, 0);
 	if (!v)
 		return 0;
 
 	return __insert_in_list(&head, &lock, &v->node);
 }
 
+SEC("syscall")
+int clear_percpu_hash_kptr(void *ctx)
+{
+	struct node_data *n;
+	struct map_value *v;
+	int key = 0;
+
+	v = bpf_map_lookup_percpu_elem(&percpu_hash, &key, 0);
+	if (!v)
+		return 0;
+
+	n = bpf_kptr_xchg(&v->node, NULL);
+	if (!n)
+		return 0;
+	bpf_obj_drop(n);
+	return probe_read_refcount();
+}
+
 SEC("tc")
 int check_percpu_hash_refcount(void *ctx)
 {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle
  2026-06-09 20:25 ` [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle Kumar Kartikeya Dwivedi
@ 2026-06-09 20:44   ` sashiko-bot
  2026-06-09 21:09   ` bot+bpf-ci
  1 sibling, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09 20:44 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Map updates and deletions on field-capable map types fail to fully clear special BTF fields (BPF-001 violation).
- [Low] Multi-line comment uses the deprecated style with text starting on the same line as the opening '/*'.
--

commit ffac6bea087651cd2b865ac62682061845ecd16e
Author: Justin Suess <utilityemal77@gmail.com>

bpf: Cancel special fields on map value recycle

This commit introduces bpf_obj_cancel_fields() to perform NMI-safe cleanup
of timer, workqueue, and task_work fields on the reusable-value path. It
updates array and hashtab update and delete paths to use this cancel
helper instead of bpf_obj_free_fields().

> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c

[ ... ]

> @@ -384,7 +384,7 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
>  	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
>  		val = this_cpu_ptr(array->pptrs[index & array->index_mask]);
>  		copy_map_value(map, val, value);
> -		bpf_obj_free_fields(array->map.record, val);
> +		bpf_obj_cancel_fields(map, val);

[Severity: High]
Does this change violate the BPF subsystem invariant for map updates?

When copy_map_value() overwrites an existing map value, the subsystem guide
states that bpf_obj_free_fields() must be called to fully release resources
held by the old value. By switching to bpf_obj_cancel_fields() in
array_map_update_elem(), it appears fields like BPF_KPTR_REF or
BPF_LIST_HEAD are skipped, which could retain kernel resources in recycled
elements.

>  	} else {
>  		val = array->value +
>  			(u64)array->elem_size * (index & array->index_mask);
> @@ -392,7 +392,7 @@ static long array_map_update_elem(struct bpf_map *map, void *key, void *value,
>  			copy_map_value_locked(map, val, value, false);
>  		else
>  			copy_map_value(map, val, value);
> -		bpf_obj_free_fields(array->map.record, val);
> +		bpf_obj_cancel_fields(map, val);
>  	}
>  	return 0;
>  }

[ ... ]

> @@ -432,14 +432,14 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>  		cpu = map_flags >> 32;
>  		ptr = per_cpu_ptr(pptr, cpu);
>  		copy_map_value(map, ptr, value);
> -		bpf_obj_free_fields(array->map.record, ptr);
> +		bpf_obj_cancel_fields(map, ptr);

[Severity: High]
Could this leak old BTF fields during a per-cpu array update?

Similar to the issue above, replacing bpf_obj_free_fields() with
bpf_obj_cancel_fields() here in bpf_percpu_array_update() seems to
bypass the full destruction of special fields when overwriting the
old map value.

>  		goto unlock;
>  	}
>  	for_each_possible_cpu(cpu) {
>  		ptr = per_cpu_ptr(pptr, cpu);
>  		val = (map_flags & BPF_F_ALL_CPUS) ? value : value + size * cpu;
>  		copy_map_value(map, ptr, val);
> -		bpf_obj_free_fields(array->map.record, ptr);
> +		bpf_obj_cancel_fields(map, ptr);
>  	}
>  unlock:
>  	rcu_read_unlock();

> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c

[ ... ]

> @@ -833,8 +837,8 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
>  	return insn - insn_buf;
>  }
>  
> -static void check_and_free_fields(struct bpf_htab *htab,
> -				  struct htab_elem *elem)
> +static void check_and_cancel_fields(struct bpf_htab *htab,
> +				    struct htab_elem *elem)
>  {
>  	if (IS_ERR_OR_NULL(htab->map.record))
>  		return;
> @@ -844,11 +848,11 @@ static void check_and_free_fields(struct bpf_htab *htab,
>  		int cpu;
>  
>  		for_each_possible_cpu(cpu)
> -			bpf_obj_free_fields(htab->map.record, per_cpu_ptr(pptr, cpu));
> +			bpf_obj_cancel_fields(&htab->map, per_cpu_ptr(pptr, cpu));
>  	} else {
>  		void *map_value = htab_elem_value(elem, htab->map.key_size);
>  
> -		bpf_obj_free_fields(htab->map.record, map_value);
> +		bpf_obj_cancel_fields(&htab->map, map_value);

[Severity: High]
Does this leave special fields partially initialized?

By calling bpf_obj_cancel_fields() instead of bpf_obj_free_fields() in
check_and_cancel_fields(), the map value is not fully zeroed and special
fields are not fully released. This is required for operations reusing
these elements across both delete and update paths.

>  	}
>  }

[ ... ]

> @@ -1252,11 +1256,11 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>  	if (l_old) {
>  		hlist_nulls_del_rcu(&l_old->hash_node);
>  
> -		/* l_old has already been stashed in htab->extra_elems, free
> -		 * its special fields before it is available for reuse.
> +		/* l_old has already been stashed in htab->extra_elems, cancel
> +		 * its reusable special fields before it is available for reuse.
>  		 */

[Severity: Low]
This isn't a bug, but does this multi-line comment follow the preferred
BPF subsystem style?

The BPF guidelines strictly require that multi-line comments have the
opening '/*' on its own line, with the comment text starting on the
next line.

>  		if (htab_is_prealloc(htab))
> -			check_and_free_fields(htab, l_old);
> +			check_and_cancel_fields(htab, l_old);

[Severity: High]
Is it safe to reuse the old element without dropping its special fields?

In htab_map_update_elem(), replacing check_and_free_fields() with
check_and_cancel_fields() means that when the old element is pushed to the
extra_elems or freelist, it may still hold onto kernel resources like
kptrs or list heads.

>  	}
>  	htab_unlock_bucket(b, flags);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609202548.3571690-1-memxor@gmail.com?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle
  2026-06-09 20:25 ` [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle Kumar Kartikeya Dwivedi
  2026-06-09 20:44   ` sashiko-bot
@ 2026-06-09 21:09   ` bot+bpf-ci
  1 sibling, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-06-09 21:09 UTC (permalink / raw)
  To: memxor, bpf
  Cc: utilityemal77, ast, andrii, daniel, eddyz87, emil, kkd,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

> Would a WARN_ON_ONCE for in_nmi in bpf_obj_free_fields help spot these kind
> of bugs in the future?
>
> Usually calling bpf_obj_free_fields in_nmi won't result in a deadlock unless
> specific conditions are met (ie last reference to a refcounted object, using
> a workqueue, etc).
>
> Adding a check here would quickly surface bugs of this class.

Reference: https://lore.kernel.org/bpf/aicB9SZfVnUV1rz-@zenbox

While there was discussion explaining rationale against this approach due to
panic-on-warn concerns and existing guardrails, the suggestion itself was not
implemented. Would it be worth revisiting this as a future enhancement, perhaps
with a rate-limited warning or a config-gated check to help catch this class of
bugs during development without impacting production systems?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs
  2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
  2026-06-09 20:25 ` [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs Kumar Kartikeya Dwivedi
  2026-06-09 20:25 ` [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle Kumar Kartikeya Dwivedi
@ 2026-06-09 20:25 ` Kumar Kartikeya Dwivedi
  2026-06-09 20:36   ` sashiko-bot
  2026-06-09 21:09   ` bot+bpf-ci
  2026-06-09 20:25 ` [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime Kumar Kartikeya Dwivedi
  2026-06-10 15:10 ` [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock patchwork-bot+netdevbpf
  4 siblings, 2 replies; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-09 20:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Emil Tsalapatis, Justin Suess, kkd, kernel-team

Add task_kfunc failure cases for bpf_obj_drop() on local objects with
referenced kptr fields from tracing and NMI tracing programs. These programs
must be rejected because dropping the object would run full special-field
destruction synchronously in an unsafe context.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/progs/task_kfunc_failure.c  | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
index 8e947d445f8e..8942b5478129 100644
--- a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
@@ -5,6 +5,7 @@
 #include <bpf/bpf_tracing.h>
 #include <bpf/bpf_helpers.h>
 
+#include "../bpf_experimental.h"
 #include "bpf_misc.h"
 #include "task_kfunc_common.h"
 
@@ -233,6 +234,45 @@ int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_
 	return 0;
 }
 
+SEC("tp_btf/task_newtask")
+__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
+int BPF_PROG(task_kfunc_obj_drop_with_kptr, struct task_struct *task, u64 clone_flags)
+{
+	struct __tasks_kfunc_map_value *local;
+
+	local = bpf_obj_new(typeof(*local));
+	if (!local)
+		return 0;
+
+	bpf_obj_drop(local);
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
+int BPF_PROG(task_kfunc_obj_drop_nmi_with_kptr, struct task_struct *task,
+	     u64 clone_flags)
+{
+	struct __tasks_kfunc_map_value *local;
+	struct task_struct *acquired, *old;
+
+	(void)clone_flags;
+
+	local = bpf_obj_new(typeof(*local));
+	if (!local)
+		return 0;
+
+	acquired = bpf_task_acquire(task);
+	if (acquired) {
+		old = bpf_kptr_xchg(&local->task, acquired);
+		if (old)
+			bpf_task_release(old);
+	}
+
+	bpf_obj_drop(local);
+	return 0;
+}
+
 SEC("tp_btf/task_newtask")
 __failure __msg("Possibly NULL pointer passed to trusted R1")
 int BPF_PROG(task_kfunc_from_pid_no_null_check, struct task_struct *task, u64 clone_flags)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs
  2026-06-09 20:25 ` [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs Kumar Kartikeya Dwivedi
@ 2026-06-09 20:36   ` sashiko-bot
  2026-06-09 21:09   ` bot+bpf-ci
  1 sibling, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09 20:36 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The test `task_kfunc_obj_drop_nmi_with_kptr` does not test NMI tracing programs as claimed by the commit message and its function name. Instead, it uses `SEC("tp_btf/task_newtask")`, which makes it a standard tracing program. Furthermore, because it fails at load-time for the exact same verifier reason as `task_kfunc_obj_drop_with_kptr`, the second test provides redundant coverage and leaves the intended NMI tracing context (`BPF_PROG_TYPE_PERF_EVENT`) completely untested.
--

commit e2b410eb90d404473252178d121d1547bd924299
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Exercise unsafe obj drops from tracing progs

This commit adds task_kfunc failure cases for bpf_obj_drop on local
objects with referenced kptr fields. It intends to verify that the
verifier rejects these operations from both standard tracing and NMI
tracing programs.

> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> --- a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
[ ... ]
> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_with_kptr, struct task_struct *task, u64 clone_flags)
> +{
[ ... ]
> +}
> +
> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_nmi_with_kptr, struct task_struct *task,
> +	     u64 clone_flags)

[Severity: Medium]
Does this test actually exercise the NMI tracing context as intended?

The function is named task_kfunc_obj_drop_nmi_with_kptr and the commit
message mentions NMI tracing programs, but the section used here is
SEC("tp_btf/task_newtask"). Should this be using an NMI-specific section
like SEC("perf_event") instead?

Because this uses the same tracing section as the previous test, it fails at
load-time for the exact same verifier reason as task_kfunc_obj_drop_with_kptr,
providing redundant coverage and leaving the NMI context untested.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609202548.3571690-1-memxor@gmail.com?part=3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs
  2026-06-09 20:25 ` [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs Kumar Kartikeya Dwivedi
  2026-06-09 20:36   ` sashiko-bot
@ 2026-06-09 21:09   ` bot+bpf-ci
  1 sibling, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-06-09 21:09 UTC (permalink / raw)
  To: memxor, bpf
  Cc: ast, andrii, daniel, eddyz87, emil, utilityemal77, kkd,
	kernel-team, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2479 bytes --]

> diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> index 8e947d445f8e..8942b5478129 100644
> --- a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
> +++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c

[ ... ]

> @@ -233,6 +234,45 @@ int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_
>  	return 0;
>  }
>
> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_with_kptr, struct task_struct *task, u64 clone_flags)
> +{

[ ... ]

> +SEC("tp_btf/task_newtask")
> +__failure __msg("bpf_obj_drop cannot be used in tracing programs on types with NMI unsafe fields")
> +int BPF_PROG(task_kfunc_obj_drop_nmi_with_kptr, struct task_struct *task,
> +	     u64 clone_flags)
> +{

Does task_kfunc_obj_drop_nmi_with_kptr actually exercise the NMI tracing
context its name and the commit message refer to?

It uses SEC("tp_btf/task_newtask"), the same standard tracing section as
task_kfunc_obj_drop_with_kptr just above it, so it would fail at load time
for the same verifier reason rather than from an NMI-specific path.

Would an NMI context such as BPF_PROG_TYPE_PERF_EVENT / SEC("perf_event")
be needed here to actually cover the NMI tracing case?

As written the two tests appear to provide redundant coverage and leave the
intended NMI context untested.

For context, an earlier version used SEC("tp_btf/nmi_handler"), which was
x86-only; switching to SEC("tp_btf/task_newtask") fixed that portability
issue but no longer exercises any NMI context.  This was raised in review
on the list and appears unaddressed in the current version:

  https://lore.kernel.org/bpf/20260609202548.3571690-4-memxor@gmail.com

> +	struct __tasks_kfunc_map_value *local;
> +	struct task_struct *acquired, *old;
> +
> +	(void)clone_flags;
> +
> +	local = bpf_obj_new(typeof(*local));
> +	if (!local)
> +		return 0;
> +
> +	acquired = bpf_task_acquire(task);
> +	if (acquired) {
> +		old = bpf_kptr_xchg(&local->task, acquired);
> +		if (old)
> +			bpf_task_release(old);
> +	}
> +
> +	bpf_obj_drop(local);
> +	return 0;
> +}


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27234422852

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime
  2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2026-06-09 20:25 ` [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs Kumar Kartikeya Dwivedi
@ 2026-06-09 20:25 ` Kumar Kartikeya Dwivedi
  2026-06-09 20:51   ` sashiko-bot
  2026-06-10 15:10 ` [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock patchwork-bot+netdevbpf
  4 siblings, 1 reply; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-06-09 20:25 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Eduard Zingerman, Emil Tsalapatis, Justin Suess, kkd, kernel-team

Add focused map_kptr coverage for BPF-side map updates that touch values
containing referenced kptrs.

The new syscall programs stash the testmod refcounted object in an array
map, a preallocated hash map, and a no-prealloc hash map, then update the
same map from BPF. The refcount must remain elevated after the update,
while the userspace runner destroys the skeleton and reuses the existing
refcount wait to confirm map teardown releases the kptr.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/map_kptr.c       | 56 ++++++++++++
 tools/testing/selftests/bpf/progs/map_kptr.c  | 89 ++++++++++++++++++-
 2 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
index ec6f2f2e8308..17e707dddda8 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -143,12 +143,68 @@ static void wait_for_map_release(void)
 	map_kptr__destroy(skel);
 }
 
+enum map_update_kptr_case {
+	MAP_UPDATE_KPTR_ARRAY,
+	MAP_UPDATE_KPTR_HASH,
+	MAP_UPDATE_KPTR_HASH_MALLOC,
+};
+
+static struct bpf_program *map_update_kptr_prog(struct map_kptr *skel,
+						enum map_update_kptr_case test)
+{
+	switch (test) {
+	case MAP_UPDATE_KPTR_ARRAY:
+		return skel->progs.test_array_map_update_kptr;
+	case MAP_UPDATE_KPTR_HASH:
+		return skel->progs.test_hash_map_update_kptr;
+	case MAP_UPDATE_KPTR_HASH_MALLOC:
+		return skel->progs.test_hash_malloc_map_update_kptr;
+	}
+
+	return NULL;
+}
+
+static void test_map_update_kptr(enum map_update_kptr_case test)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts);
+	struct map_kptr *skel;
+	struct bpf_program *prog;
+	int ret;
+
+	skel = map_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
+		return;
+
+	prog = map_update_kptr_prog(skel, test);
+	if (!ASSERT_OK_PTR(prog, "map_update_kptr_prog"))
+		goto out;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(prog), &opts);
+	if (!ASSERT_OK(ret, "map_update_kptr"))
+		goto out;
+	if (!ASSERT_OK(opts.retval, "map_update_kptr retval"))
+		goto out;
+
+	ASSERT_EQ(skel->bss->num_of_refs, 3, "refs_after_update");
+
+out:
+	map_kptr__destroy(skel);
+	wait_for_map_release();
+}
+
 void serial_test_map_kptr(void)
 {
 	struct rcu_tasks_trace_gp *skel;
 
 	RUN_TESTS(map_kptr_fail);
 
+	if (test__start_subtest("update_array_map_kptr"))
+		test_map_update_kptr(MAP_UPDATE_KPTR_ARRAY);
+	if (test__start_subtest("update_hash_map_kptr"))
+		test_map_update_kptr(MAP_UPDATE_KPTR_HASH);
+	if (test__start_subtest("update_hash_malloc_map_kptr"))
+		test_map_update_kptr(MAP_UPDATE_KPTR_HASH_MALLOC);
+
 	skel = rcu_tasks_trace_gp__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "rcu_tasks_trace_gp__open_and_load"))
 		return;
diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
index e708ffbe1f61..3fbefc568e0a 100644
--- a/tools/testing/selftests/bpf/progs/map_kptr.c
+++ b/tools/testing/selftests/bpf/progs/map_kptr.c
@@ -489,8 +489,7 @@ int test_map_kptr_ref3(struct __sk_buff *ctx)
 
 int num_of_refs;
 
-SEC("syscall")
-int count_ref(void *ctx)
+static __always_inline int read_ref_count(void)
 {
 	struct prog_test_ref_kfunc *p;
 	unsigned long arg = 0;
@@ -500,11 +499,95 @@ int count_ref(void *ctx)
 		return 1;
 
 	num_of_refs = p->cnt.refs.counter;
-
 	bpf_kfunc_call_test_release(p);
 	return 0;
 }
 
+SEC("syscall")
+int count_ref(void *ctx)
+{
+	return read_ref_count();
+}
+
+static __always_inline int stash_ref_ptr(struct map_value *v)
+{
+	struct prog_test_ref_kfunc *p, *old;
+	unsigned long arg = 0;
+
+	p = bpf_kfunc_call_test_acquire(&arg);
+	if (!p)
+		return 1;
+
+	old = bpf_kptr_xchg(&v->ref_ptr, p);
+	if (old) {
+		bpf_kfunc_call_test_release(old);
+		old = bpf_kptr_xchg(&v->ref_ptr, NULL);
+		if (old)
+			bpf_kfunc_call_test_release(old);
+		return 2;
+	}
+	return 0;
+}
+
+static __always_inline int check_refs(int expected)
+{
+	int ret;
+
+	ret = read_ref_count();
+	if (ret)
+		return ret;
+	return num_of_refs == expected ? 0 : 3;
+}
+
+SEC("syscall")
+int test_array_map_update_kptr(void *ctx)
+{
+	struct map_value init = {}, *v;
+	int key = 0, ret;
+
+	v = bpf_map_lookup_elem(&array_map, &key);
+	if (!v)
+		return 1;
+	ret = stash_ref_ptr(v);
+	if (ret)
+		return ret;
+	ret = check_refs(3);
+	if (ret)
+		return ret;
+	ret = bpf_map_update_elem(&array_map, &key, &init, BPF_EXIST);
+	if (ret)
+		return 4;
+	return check_refs(3);
+}
+
+#define DEFINE_HASH_UPDATE_KPTR_TEST(name, map)			\
+SEC("syscall")							\
+int name(void *ctx)						\
+{								\
+	struct map_value init = {}, *v;				\
+	int key = 0, ret;					\
+								\
+	ret = bpf_map_update_elem(&map, &key, &init, BPF_NOEXIST); \
+	if (ret)						\
+		return 1;					\
+	v = bpf_map_lookup_elem(&map, &key);			\
+	if (!v)							\
+		return 2;					\
+	ret = stash_ref_ptr(v);					\
+	if (ret)						\
+		return ret;					\
+	ret = check_refs(3);					\
+	if (ret)						\
+		return ret;					\
+	ret = bpf_map_update_elem(&map, &key, &init, BPF_EXIST); \
+	if (ret)						\
+		return 4;					\
+	return check_refs(3);					\
+}
+
+DEFINE_HASH_UPDATE_KPTR_TEST(test_hash_map_update_kptr, hash_map)
+DEFINE_HASH_UPDATE_KPTR_TEST(test_hash_malloc_map_update_kptr, hash_malloc_map)
+
 SEC("syscall")
 int test_ls_map_kptr_ref1(void *ctx)
 {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime
  2026-06-09 20:25 ` [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime Kumar Kartikeya Dwivedi
@ 2026-06-09 20:51   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09 20:51 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The newly added selftests (`test_array_map_update_kptr` and `test_hash_map_update_kptr`) codify a preexisting bug where `bpf_map_update_elem` fails to free kptrs in hash and array maps. The tests assert that the kptr refcount remains elevated (`check_refs(3)`) after the update, incorrectly locking in the broken BTF field lifecycle behavior.
--

commit ca10a096e58e2f06a9a6605b1b8ff02a49d2d192
Author: Kumar Kartikeya Dwivedi <memxor@gmail.com>

selftests/bpf: Exercise kptr map update lifetime

This commit adds focused map_kptr coverage for BPF-side map updates that
touch values containing referenced kptrs. The test stashes refcounted objects
in array and hash maps, then updates them from BPF to verify the refcount
behavior during map teardown.

> diff --git a/tools/testing/selftests/bpf/progs/map_kptr.c b/tools/testing/selftests/bpf/progs/map_kptr.c
> index e708ffbe1f610..3fbefc568e0a6 100644
> --- a/tools/testing/selftests/bpf/progs/map_kptr.c
> +++ b/tools/testing/selftests/bpf/progs/map_kptr.c
[ ... ]
> @@ -500,11 +499,95 @@ int count_ref(void *ctx)
[ ... ]
> +SEC("syscall")
> +int test_array_map_update_kptr(void *ctx)
> +{
> +	struct map_value init = {}, *v;
> +	int key = 0, ret;
> +
> +	v = bpf_map_lookup_elem(&array_map, &key);
> +	if (!v)
> +		return 1;
> +	ret = stash_ref_ptr(v);
> +	if (ret)
> +		return ret;
> +	ret = check_refs(3);
> +	if (ret)
> +		return ret;
> +	ret = bpf_map_update_elem(&array_map, &key, &init, BPF_EXIST);
> +	if (ret)
> +		return 4;
> +	return check_refs(3);

[Severity: High]
This is a pre-existing issue, but does this test codify a bug where
bpf_map_update_elem() fails to free kptrs in hash and array maps?

When a map element is overwritten with an empty value, shouldn't the kptr
reference count drop? According to the BTF field lifecycle guidelines, map
updates that overwrite existing values with copy_map_value() must call
bpf_obj_free_fields() to release resources held by the old value.

However, it appears array_map_update_elem() in kernel/bpf/arraymap.c calls
bpf_obj_cancel_fields():

kernel/bpf/arraymap.c:array_map_update_elem() {
    ...
    copy_map_value(map, val, value);
    bpf_obj_cancel_fields(map, val);
    ...
}

And free_htab_elem() in kernel/bpf/hashtab.c calls
check_and_cancel_fields() before pushing to the freelist:

kernel/bpf/hashtab.c:free_htab_elem() {
    ...
    check_and_cancel_fields(htab, l);
    pcpu_freelist_push(&htab->freelist, &l->fnode);
    ...
}

Both of these paths skip kptr cleanup.

By asserting that the refcount remains unchanged at 3 after the update in
test_array_map_update_kptr() and test_hash_map_update_kptr(), the tests appear
to incorrectly lock in the broken BTF field lifecycle behavior where array
kptrs leak and preallocated hash map kptrs are resurrected when reallocated.

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609202548.3571690-1-memxor@gmail.com?part=4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock
  2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2026-06-09 20:25 ` [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime Kumar Kartikeya Dwivedi
@ 2026-06-10 15:10 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-06-10 15:10 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, ast, andrii, daniel, eddyz87, emil, utilityemal77, kkd,
	kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue,  9 Jun 2026 22:25:42 +0200 you wrote:
> Referenced kptr destruction can run from tracing/NMI contexts through
> bpf_obj_drop() and map value update/delete paths, reaching NMI-unsafe
> special field teardown and deadlocks. Justin reported the issue and
> iterated on fixes in [0]-[2], and also confirmed the bpf_obj_drop()
> reproducer in [3].
> 
> This series rejects unsafe obj drops from non-iterator tracing programs,
> limits map value recycle to NMI-safe field cancellation, and adds
> focused selftests for the obj_drop(), NMI delete, and recycle teardown
> cases.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/4] bpf: Reject bpf_obj_drop() from tracing progs
    https://git.kernel.org/bpf/bpf-next/c/94c8d1c21be4
  - [bpf-next,v3,2/4] bpf: Cancel special fields on map value recycle
    https://git.kernel.org/bpf/bpf-next/c/a3a81d247651
  - [bpf-next,v3,3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs
    https://git.kernel.org/bpf/bpf-next/c/4b84518137ce
  - [bpf-next,v3,4/4] selftests/bpf: Exercise kptr map update lifetime
    https://git.kernel.org/bpf/bpf-next/c/2e7c6cb4d843

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-10 15:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 20:25 [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock Kumar Kartikeya Dwivedi
2026-06-09 20:25 ` [PATCH bpf-next v3 1/4] bpf: Reject bpf_obj_drop() from tracing progs Kumar Kartikeya Dwivedi
2026-06-09 20:58   ` bot+bpf-ci
2026-06-09 20:25 ` [PATCH bpf-next v3 2/4] bpf: Cancel special fields on map value recycle Kumar Kartikeya Dwivedi
2026-06-09 20:44   ` sashiko-bot
2026-06-09 21:09   ` bot+bpf-ci
2026-06-09 20:25 ` [PATCH bpf-next v3 3/4] selftests/bpf: Exercise unsafe obj drops from tracing progs Kumar Kartikeya Dwivedi
2026-06-09 20:36   ` sashiko-bot
2026-06-09 21:09   ` bot+bpf-ci
2026-06-09 20:25 ` [PATCH bpf-next v3 4/4] selftests/bpf: Exercise kptr map update lifetime Kumar Kartikeya Dwivedi
2026-06-09 20:51   ` sashiko-bot
2026-06-10 15:10 ` [PATCH bpf-next v3 0/4] Fix kptr dtor deadlock patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox