* [PATCH bpf-next v3 0/2] Allow storing referenced struct file kptrs in BPF maps
@ 2026-04-24 19:22 Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add test for map-stored struct file kptrs Justin Suess
0 siblings, 2 replies; 10+ messages in thread
From: Justin Suess @ 2026-04-24 19:22 UTC (permalink / raw)
To: ast, daniel, andrii, eddyz87, memxor
Cc: martin.lau, song, yonghong.song, jolsa, bpf, mic, Justin Suess
Hello,
This series adds a destructor for struct file, enabling it to be used as a
referenced kptr in BPF maps.
This iteration of the series just addresses a small test failure.
v1: https://lore.kernel.org/bpf/20260420203306.3107246-1-utilityemal77@gmail.com/
v2: https://lore.kernel.org/bpf/20260423152239.4140627-1-utilityemal77@gmail.com/
The destructor mirrors bpf_put_file() semantics and releases the reference
via fput(). This allows pointers returned from kfuncs such as
bpf_get_task_exe_file() to be safely stored and later reused with helpers
and kfuncs that operate on struct file.
This fills a gap compared to bpf_dynptr_from_file(), where the resulting
dynptr can be stored in a map, but cannot be passed to kfuncs expecting
a struct file *.
Use cases include caching file references across events and deferring
processing while keeping the underlying file alive.
Patch 1 adds the struct file kptr destructor and wires it into the BTF
kfunc sets.
Patch 2 adds selftests covering successful use and verifier rejection of
unreferenced pointers and success cases.
Changes for v3:
- Fix variable name in failure test case to pass.
- Better handle fd closure for file copy helper.
- Remove dead test function.
Changes for v2:
- Fix comment formatting based on sashiko feedback.
- Fix typo in commit based on sashiko feedback.
- Remove close_fd helper based on Song Liu's feedback.
- Expand selftests to include success and failure verifier tests.
Note on NMI safety:
An existing bug in NMI safety after further investigation from this
sashiko report [1] was found and reported here, with reproducer. [2]
Basically it is possible to invoke map dtors from within an nmi context
by attaching to specific tracepoints. Because the existing dtors rely on
rcu and/or locks, you can cause deadlocks. The fput added by this patch
is unsafe in nmi context and affected by this bug.
However, the solution requires deeper investigation into verifier safety
in NMI contexts, and is outside the scope of this patch.
[1] : https://lore.kernel.org/bpf/20260421010536.17FB1C19425@smtp.kernel.org/
[2] : https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
Justin Suess (2):
bpf: Implement dtor for struct file BTF ID
selftests/bpf: Add test for map-stored struct file kptrs
kernel/bpf/helpers.c | 16 +-
.../bpf/prog_tests/refcounted_kptr_file.c | 247 ++++++++++++++++++
.../bpf/progs/refcounted_kptr_file.c | 158 +++++++++++
.../bpf/progs/refcounted_kptr_file_fail.c | 141 ++++++++++
.../bpf/progs/refcounted_kptr_file_success.c | 61 +++++
5 files changed, 622 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/refcounted_kptr_file.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file_success.c
--
2.53.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 19:22 [PATCH bpf-next v3 0/2] Allow storing referenced struct file kptrs in BPF maps Justin Suess
@ 2026-04-24 19:22 ` Justin Suess
2026-04-24 19:52 ` bot+bpf-ci
2026-04-24 19:59 ` sashiko-bot
2026-04-24 19:22 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add test for map-stored struct file kptrs Justin Suess
1 sibling, 2 replies; 10+ messages in thread
From: Justin Suess @ 2026-04-24 19:22 UTC (permalink / raw)
To: ast, daniel, andrii, eddyz87, memxor
Cc: martin.lau, song, yonghong.song, jolsa, bpf, mic, Justin Suess
Implement a destructor for struct file allowing it to be stored in a map
as a referenced kptr.
This dtor is analogous to bpf_put_file, and calls fput to release the
reference.
Acked-by: Song Liu <song@kernel.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
---
Changes in v3:
- None
Changes in v2:
- None
kernel/bpf/helpers.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index baa12b24bb64..4739e2bf0f31 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -20,6 +20,7 @@
#include <linux/proc_ns.h>
#include <linux/sched/task.h>
#include <linux/security.h>
+#include <linux/fs.h>
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/kasan.h>
@@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
}
CFI_NOSEAL(bpf_task_release_dtor);
+__bpf_kfunc void bpf_file_release_dtor(void *file)
+{
+ fput((struct file *)file);
+}
+
+CFI_NOSEAL(bpf_file_release_dtor);
+
#ifdef CONFIG_CGROUPS
/**
* bpf_cgroup_acquire - Acquire a reference to a cgroup. A cgroup acquired by
@@ -4763,6 +4771,8 @@ static const struct btf_kfunc_id_set generic_kfunc_set = {
BTF_ID_LIST(generic_dtor_ids)
BTF_ID(struct, task_struct)
BTF_ID(func, bpf_task_release_dtor)
+BTF_ID(struct, file)
+BTF_ID(func, bpf_file_release_dtor)
#ifdef CONFIG_CGROUPS
BTF_ID(struct, cgroup)
BTF_ID(func, bpf_cgroup_release_dtor)
@@ -4874,11 +4884,15 @@ static int __init kfunc_init(void)
.btf_id = generic_dtor_ids[0],
.kfunc_btf_id = generic_dtor_ids[1]
},
-#ifdef CONFIG_CGROUPS
{
.btf_id = generic_dtor_ids[2],
.kfunc_btf_id = generic_dtor_ids[3]
},
+#ifdef CONFIG_CGROUPS
+ {
+ .btf_id = generic_dtor_ids[4],
+ .kfunc_btf_id = generic_dtor_ids[5]
+ },
#endif
};
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH bpf-next v3 2/2] selftests/bpf: Add test for map-stored struct file kptrs
2026-04-24 19:22 [PATCH bpf-next v3 0/2] Allow storing referenced struct file kptrs in BPF maps Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
@ 2026-04-24 19:22 ` Justin Suess
1 sibling, 0 replies; 10+ messages in thread
From: Justin Suess @ 2026-04-24 19:22 UTC (permalink / raw)
To: ast, daniel, andrii, eddyz87, memxor
Cc: martin.lau, song, yonghong.song, jolsa, bpf, mic, Justin Suess
Add tests for map-stored struct file referenced kptrs.
Add a test storing a referenced kptr in a map. The referenced pointer is
acquired by get_task_exe_file in this case.
Add a test that verifies that the struct file reference remains valid
when the original file descriptor is closed, by checking the extended
attributes on it after completion. Trigger this validation manually by
hooking file_open and opening dev null.
Validation is done with xattrs.
Add a test verifying that attempting to insert unreferenced struct file
pointers into the map is rejected by the verifier.
Add fail tests for reference counting with the struct file objects.
Cc: Song Liu <song@kernel.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
---
Changes in v3:
- Properly close fd in test case.
- Fix variable name in fail test case.
- Remove dead function.
Changes in v2:
- Add success verifier tests per Song Liu's suggestion.
- Remove close_fd per Song Liu's suggestion.
- Fix typos in commit per Sashiko.
- Fix subtest ordering per Sashiko.
- Fix comments per Sashiko.
.../bpf/prog_tests/refcounted_kptr_file.c | 247 ++++++++++++++++++
.../bpf/progs/refcounted_kptr_file.c | 158 +++++++++++
.../bpf/progs/refcounted_kptr_file_fail.c | 141 ++++++++++
.../bpf/progs/refcounted_kptr_file_success.c | 61 +++++
4 files changed, 607 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/refcounted_kptr_file.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_file_success.c
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr_file.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr_file.c
new file mode 100644
index 000000000000..7656f0428812
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr_file.c
@@ -0,0 +1,247 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <errno.h>
+#include <test_progs.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <sys/xattr.h>
+#include <unistd.h>
+
+#include "refcounted_kptr_file_fail.skel.h"
+#include "refcounted_kptr_file.skel.h"
+#include "refcounted_kptr_file_success.skel.h"
+
+static const char shell_path[] = "/bin/sh";
+static const char xattr_name[] = "user.kptr_ref";
+static const char xattr_value[] = "kptr-live";
+
+static int write_full(int fd, const void *buf, size_t len)
+{
+ const char *pos = buf;
+
+ while (len) {
+ ssize_t written;
+
+ written = write(fd, pos, len);
+ if (written < 0) {
+ if (errno == EINTR)
+ continue;
+ return -errno;
+ }
+
+ pos += written;
+ len -= written;
+ }
+
+ return 0;
+}
+
+static int copy_file(const char *src_path, const char *dst_path, mode_t mode)
+{
+ char buf[4096];
+ int src_fd = -1, dst_fd = -1;
+ int err = 0;
+
+ src_fd = open(src_path, O_RDONLY | O_CLOEXEC);
+ if (src_fd < 0)
+ return -errno;
+
+ dst_fd = open(dst_path, O_WRONLY | O_CREAT | O_EXCL | O_CLOEXEC,
+ mode & 0777);
+ if (dst_fd < 0) {
+ err = -errno;
+ goto out;
+ }
+
+ while (1) {
+ ssize_t bytes_read;
+
+ bytes_read = read(src_fd, buf, sizeof(buf));
+ if (bytes_read < 0) {
+ if (errno == EINTR)
+ continue;
+ err = -errno;
+ goto out;
+ }
+
+ if (!bytes_read)
+ break;
+
+ err = write_full(dst_fd, buf, bytes_read);
+ if (err) {
+ unlink(dst_path);
+ goto out;
+ }
+ }
+
+out:
+ if (src_fd >= 0 && close(src_fd) && !err)
+ err = -errno;
+ if (dst_fd >= 0 && close(dst_fd) && !err)
+ err = -errno;
+ return err;
+}
+
+static bool prepare_tagged_shell(char *temp_dir, size_t temp_dir_sz,
+ char *shell_copy, size_t shell_copy_sz)
+{
+ struct stat st;
+ int err;
+
+ if (!ASSERT_LT(snprintf(temp_dir, temp_dir_sz,
+ "./refcounted_kptr_file.XXXXXX"),
+ (int)temp_dir_sz, "temp_dir_template"))
+ return false;
+
+ if (!ASSERT_OK_PTR(mkdtemp(temp_dir), "mkdtemp"))
+ return false;
+
+ if (!ASSERT_LT(snprintf(shell_copy, shell_copy_sz, "%s/sh", temp_dir),
+ (int)shell_copy_sz, "shell_copy_path"))
+ goto err_rmdir;
+
+ if (!ASSERT_OK(stat(shell_path, &st), "stat_shell"))
+ goto err_rmdir;
+
+ err = copy_file(shell_path, shell_copy, st.st_mode);
+ if (!ASSERT_OK(err, "copy_shell"))
+ goto err_unlink;
+
+ err = setxattr(shell_copy, xattr_name, xattr_value, sizeof(xattr_value), 0);
+ if (err && errno == EOPNOTSUPP) {
+ printf("%s:SKIP:filesystem does not support user xattr (%d)\n",
+ __func__, errno);
+ test__skip();
+ goto err_unlink;
+ }
+
+ if (!ASSERT_OK(err, "setxattr_shell"))
+ goto err_unlink;
+
+ return true;
+
+err_unlink:
+ unlink(shell_copy);
+err_rmdir:
+ rmdir(temp_dir);
+ return false;
+}
+
+static void run_refcounted_file_kptr_success(void)
+{
+ struct refcounted_kptr_file *skel;
+ char shell_copy[PATH_MAX] = {};
+ char temp_dir[PATH_MAX] = {};
+ int pipefd[2] = { -1, -1 };
+ int status;
+ pid_t child_pid = -1;
+ int err, fd = -1;
+
+ skel = refcounted_kptr_file__open();
+ if (!ASSERT_OK_PTR(skel, "refcounted_kptr_file__open"))
+ return;
+
+ err = refcounted_kptr_file__load(skel);
+ if (!ASSERT_OK(err, "refcounted_kptr_file__load"))
+ goto out;
+
+ err = refcounted_kptr_file__attach(skel);
+ if (!ASSERT_OK(err, "refcounted_kptr_file__attach"))
+ goto out;
+
+ if (!prepare_tagged_shell(temp_dir, sizeof(temp_dir), shell_copy,
+ sizeof(shell_copy)))
+ goto out;
+
+ if (!ASSERT_OK(pipe2(pipefd, O_CLOEXEC), "pipe2"))
+ goto out;
+
+ child_pid = fork();
+ if (!ASSERT_GT(child_pid, -1, "fork"))
+ goto out;
+
+ if (child_pid == 0) {
+ char sync;
+
+ close(pipefd[1]);
+ if (read(pipefd[0], &sync, 1) != 1)
+ _exit(127);
+ close(pipefd[0]);
+ execl(shell_copy, shell_copy, "-c", ": </dev/null", NULL);
+ _exit(127);
+ }
+
+ close(pipefd[0]);
+ pipefd[0] = -1;
+
+ /*
+ * Set the PID *before* unblocking the child so that the BPF program's
+ * file_open hook can filter on it from the very first file opened during
+ * exec. If the write happened first, early file_open events during
+ * exec could be missed.
+ */
+ skel->bss->file_kptr_insert_pid = child_pid;
+ err = write_full(pipefd[1], "1", 1);
+ if (!ASSERT_OK(err, "start_child"))
+ goto out;
+ close(pipefd[1]);
+ pipefd[1] = -1;
+
+ if (!ASSERT_EQ(waitpid(child_pid, &status, 0), child_pid, "waitpid"))
+ goto out;
+ child_pid = -1;
+ if (!ASSERT_TRUE(WIFEXITED(status), "child_exited"))
+ goto out;
+ if (!ASSERT_EQ(WEXITSTATUS(status), 0, "child_status"))
+ goto out;
+
+ skel->bss->file_kptr_verify_pid = getpid();
+ /*
+ * The child is gone at this point. Reopening an unrelated file triggers a
+ * second file_open hook where the BPF program validates the stashed ref.
+ * Our test op for the ref validity is reading the xattrs we set earlier.
+ */
+ fd = open("/dev/null", O_RDONLY | O_CLOEXEC);
+ if (!ASSERT_GE(fd, 0, "open_dev_null"))
+ goto out;
+ close(fd);
+ fd = -1;
+
+ ASSERT_EQ(skel->bss->file_kptr_err, 0, "file_kptr_err");
+ ASSERT_EQ(skel->bss->file_kptr_inserted, 1, "file_kptr_inserted");
+ ASSERT_EQ(skel->bss->file_kptr_verified, 1, "file_kptr_verified");
+ ASSERT_EQ(skel->bss->file_kptr_xattr_ret, sizeof(xattr_value),
+ "file_kptr_xattr_ret");
+ ASSERT_EQ(strcmp(skel->bss->file_kptr_value, xattr_value), 0,
+ "file_kptr_value");
+
+out:
+ close(fd);
+ close(pipefd[0]);
+ close(pipefd[1]);
+ if (child_pid > 0)
+ (void)waitpid(child_pid, NULL, 0);
+ if (shell_copy[0])
+ unlink(shell_copy);
+ if (temp_dir[0])
+ rmdir(temp_dir);
+ refcounted_kptr_file__destroy(skel);
+}
+void test_refcounted_kptr_file(void)
+{
+ RUN_TESTS(refcounted_kptr_file_success);
+
+ RUN_TESTS(refcounted_kptr_file_fail);
+
+ if (test__start_subtest("holds_ref_after_close"))
+ run_refcounted_file_kptr_success();
+
+ RUN_TESTS(refcounted_kptr_file);
+}
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_file.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_file.c
new file mode 100644
index 000000000000..510ea1107791
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_file.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_kfuncs.h"
+#include "bpf_experimental.h"
+
+struct file_map_value {
+ struct file __kptr * file;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, struct file_map_value);
+ __uint(max_entries, 1);
+} stashed_files SEC(".maps");
+
+static const char xattr_name[] = "user.kptr_ref";
+static const char expected_value[] = "kptr-live";
+
+char file_kptr_probe_value[32];
+int file_kptr_insert_pid;
+int file_kptr_verify_pid;
+int file_kptr_inserted;
+int file_kptr_verified;
+int file_kptr_err;
+int file_kptr_xattr_ret;
+char file_kptr_value[32];
+
+SEC("lsm.s/file_open")
+__description("file_kptr: insert referenced file")
+__success
+int insert_file_kptr(struct file *ctx_file)
+{
+ struct bpf_dynptr value_ptr;
+ struct file_map_value *mapval;
+ struct task_struct *task;
+ struct file *file, *old;
+ int ret;
+ int zero = 0;
+
+ (void)ctx_file;
+
+ if ((__u32)(bpf_get_current_pid_tgid() >> 32) != (__u32)file_kptr_insert_pid)
+ return 0;
+
+ if (file_kptr_inserted)
+ return 0;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval) {
+ file_kptr_err = 1;
+ return 0;
+ }
+
+ task = bpf_get_current_task_btf();
+ file = bpf_get_task_exe_file(task);
+ if (!file) {
+ file_kptr_err = 2;
+ return 0;
+ }
+
+ /*
+ * Exec can open multiple files while the new image is being installed.
+ * Only stash the child's final executable, which we identify by the test
+ * xattr.
+ */
+ ret = bpf_dynptr_from_mem(file_kptr_probe_value,
+ sizeof(file_kptr_probe_value), 0,
+ &value_ptr);
+ if (ret) {
+ file_kptr_err = 8;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ ret = bpf_get_file_xattr(file, xattr_name, &value_ptr);
+ if (ret != sizeof(expected_value) ||
+ bpf_strncmp(file_kptr_probe_value, sizeof(expected_value),
+ expected_value)) {
+ bpf_put_file(file);
+ return 0;
+ }
+
+ old = bpf_kptr_xchg(&mapval->file, file);
+ if (old)
+ bpf_put_file(old);
+
+ file_kptr_inserted = 1;
+ return 0;
+}
+
+SEC("lsm.s/file_open")
+__description("file_kptr: verify referenced file")
+__success
+int verify_file_kptr(struct file *ctx_file)
+{
+ struct bpf_dynptr value_ptr;
+ struct file_map_value *mapval;
+ struct file *file;
+ int zero = 0;
+ int ret;
+
+ (void)ctx_file;
+
+ if ((__u32)(bpf_get_current_pid_tgid() >> 32) != (__u32)file_kptr_verify_pid)
+ return 0;
+
+ if (!file_kptr_inserted || file_kptr_verified)
+ return 0;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval) {
+ file_kptr_err = 3;
+ return 0;
+ }
+
+ /*
+ * Pull the file out of the map to get a referenced pointer for the xattr
+ * kfunc and to drop the map's last reference once verification completes.
+ */
+ file = bpf_kptr_xchg(&mapval->file, NULL);
+ if (!file) {
+ file_kptr_err = 4;
+ return 0;
+ }
+
+ ret = bpf_dynptr_from_mem(file_kptr_value, sizeof(file_kptr_value), 0,
+ &value_ptr);
+ if (ret) {
+ file_kptr_err = 5;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ ret = bpf_get_file_xattr(file, xattr_name, &value_ptr);
+ file_kptr_xattr_ret = ret;
+ if (ret != sizeof(expected_value)) {
+ file_kptr_err = 6;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ if (bpf_strncmp(file_kptr_value, sizeof(expected_value), expected_value)) {
+ file_kptr_err = 7;
+ bpf_put_file(file);
+ return 0;
+ }
+
+ file_kptr_verified = 1;
+ bpf_put_file(file);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_file_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_file_fail.c
new file mode 100644
index 000000000000..635d2493fd78
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_file_fail.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct file_map_value {
+ struct file __kptr * file;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, struct file_map_value);
+ __uint(max_entries, 1);
+} stashed_files SEC(".maps");
+
+SEC("lsm.s/file_open")
+__failure __msg("R2 type=ctx expected=ptr_, trusted_ptr_, rcu_ptr_")
+int stash_unref_ctx_file(struct file *ctx_file)
+{
+ struct file_map_value *mapval;
+ int zero = 0;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval)
+ return 0;
+
+ /* ctx_file is just the hook argument, not an acquired file reference. */
+ bpf_kptr_xchg(&mapval->file, ctx_file);
+ return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure
+__msg("Unreleased reference")
+int stash_xchg_file_kptr_unreleased(struct file *ctx_file)
+{
+ struct file_map_value *mapval;
+ struct file *file = NULL;
+ int zero = 0;
+
+ (void)ctx_file;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval)
+ return 0;
+
+ file = bpf_get_task_exe_file(bpf_get_current_task_btf());
+ if (!file)
+ return 0;
+
+ file = bpf_kptr_xchg(&mapval->file, file);
+ if (file)
+ bpf_put_file(file);
+
+ file = bpf_kptr_xchg(&mapval->file, NULL);
+ if (!file)
+ return 0;
+
+ /* Retrieved kptr is never released. */
+ return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure
+__msg("Possibly NULL pointer passed to trusted R1")
+int stash_xchg_file_kptr_no_null_check(struct file *ctx_file)
+{
+ struct file_map_value *mapval;
+ struct file *file = NULL;
+ int zero = 0;
+
+ (void)ctx_file;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval)
+ return 0;
+
+ file = bpf_get_task_exe_file(bpf_get_current_task_btf());
+ if (!file)
+ return 0;
+
+ file = bpf_kptr_xchg(&mapval->file, file);
+ if (file)
+ bpf_put_file(file);
+
+ file = bpf_kptr_xchg(&mapval->file, NULL);
+ bpf_put_file(file);
+ return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure
+__msg("R1 must be referenced or trusted")
+int put_map_owned_file(struct file *ctx_file)
+{
+ struct file_map_value *mapval;
+ struct file *file = NULL;
+ int zero = 0;
+
+ (void)ctx_file;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval)
+ return 0;
+
+ file = bpf_get_task_exe_file(bpf_get_current_task_btf());
+ if (!file)
+ return 0;
+
+ file = bpf_kptr_xchg(&mapval->file, file);
+ if (file)
+ bpf_put_file(file);
+
+ bpf_rcu_read_lock();
+ file = mapval->file;
+ if (!file) {
+ bpf_rcu_read_unlock();
+ return 0;
+ }
+
+ /* Can't release a kptr while it is still owned by the map. */
+ bpf_put_file(file);
+ bpf_rcu_read_unlock();
+ return 0;
+}
+
+SEC("lsm.s/file_open")
+__failure
+__msg("release kernel function bpf_put_file expects")
+int BPF_PROG(put_unref_ctx_file, struct file *file)
+{
+ /* Can't release the unacquired LSM hook argument. */
+ bpf_put_file(file);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_file_success.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_file_success.c
new file mode 100644
index 000000000000..ece00c9e584f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_file_success.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct file_map_value {
+ struct file __kptr * file;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, struct file_map_value);
+ __uint(max_entries, 1);
+} stashed_files SEC(".maps");
+
+SEC("lsm.s/file_open")
+__description("file_kptr: xchg and reinsert")
+__success
+int xchg_reinsert_file_kptr(struct file *ctx_file)
+{
+ struct file_map_value *mapval;
+ struct file *file, *old;
+ int zero = 0;
+
+ (void)ctx_file;
+
+ mapval = bpf_map_lookup_elem(&stashed_files, &zero);
+ if (!mapval)
+ return 0;
+
+ file = bpf_get_task_exe_file(bpf_get_current_task_btf());
+ if (!file)
+ return 0;
+
+ old = bpf_kptr_xchg(&mapval->file, file);
+ if (old)
+ bpf_put_file(old);
+
+ file = bpf_kptr_xchg(&mapval->file, NULL);
+ if (!file)
+ return 0;
+
+ old = bpf_kptr_xchg(&mapval->file, file);
+ if (old) {
+ bpf_put_file(old);
+ return 0;
+ }
+
+ file = bpf_kptr_xchg(&mapval->file, NULL);
+ if (file)
+ bpf_put_file(file);
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
@ 2026-04-24 19:52 ` bot+bpf-ci
2026-04-24 19:59 ` sashiko-bot
1 sibling, 0 replies; 10+ messages in thread
From: bot+bpf-ci @ 2026-04-24 19:52 UTC (permalink / raw)
To: utilityemal77, ast, daniel, andrii, eddyz87, memxor
Cc: martin.lau, song, yonghong.song, jolsa, bpf, mic, utilityemal77,
ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index baa12b24bb64..4739e2bf0f31 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -20,6 +20,7 @@
> #include <linux/proc_ns.h>
> #include <linux/sched/task.h>
> #include <linux/security.h>
> +#include <linux/fs.h>
> #include <linux/btf_ids.h>
> #include <linux/bpf_mem_alloc.h>
> #include <linux/kasan.h>
> @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> }
> CFI_NOSEAL(bpf_task_release_dtor);
>
> +__bpf_kfunc void bpf_file_release_dtor(void *file)
> +{
> + fput((struct file *)file);
> +}
> +
> +CFI_NOSEAL(bpf_file_release_dtor);
Is it safe to call fput() here? BPF programs can update or delete map
elements from NMI context (for example, in BPF_PROG_TYPE_PERF_EVENT
programs), which will synchronously invoke this destructor.
When fput() is called in an interrupt context and the refcount reaches 0,
it adds the file to the delayed_fput_list and calls schedule_delayed_work().
However, schedule_delayed_work() is not NMI-safe, as it attempts to acquire
spinlocks such as the workqueue pool lock and timer_base->lock.
If the NMI interrupted a context that already held one of these locks, won't
this result in a hard deadlock?
Multiple reviewers raised this concern in v1 and v2 of this patch series.
In response to v1, you acknowledged this as a "real bug / reproducible
deadlock" but stated it would be addressed separately as an "orthogonal
issue." However, v3 still uses the same unsafe fput() call without any
NMI-safe deferral mechanism (e.g., irq_work).
Reference: https://lore.kernel.org/bpf/20260421010536.17FB1C19425@smtp.kernel.org/
> +
> #ifdef CONFIG_CGROUPS
> /**
> * bpf_cgroup_acquire - Acquire a reference to a cgroup. A cgroup acquired by
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24908056518
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
2026-04-24 19:52 ` bot+bpf-ci
@ 2026-04-24 19:59 ` sashiko-bot
2026-04-24 20:12 ` Justin Suess
1 sibling, 1 reply; 10+ messages in thread
From: sashiko-bot @ 2026-04-24 19:59 UTC (permalink / raw)
To: Justin Suess; +Cc: bpf
Sashiko AI review found 1 potential issue(s):
- [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
--
commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
Author: Justin Suess <utilityemal77@gmail.com>
bpf: Implement dtor for struct file BTF ID
This commit implements a destructor for struct file, allowing it to be
stored in a map as a referenced kptr. The destructor calls fput to
release the reference.
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 2bb60200c2665e..ee5ceb82e4f812 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
[ ... ]
> @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> }
> CFI_NOSEAL(bpf_task_release_dtor);
>
> +__bpf_kfunc void bpf_file_release_dtor(void *file)
> +{
> + fput((struct file *)file);
> +}
Can this cause a deadlock if the destructor is invoked from NMI context?
If an NMI-context BPF program (such as a perf event tracing program)
updates or deletes a map element containing a struct file kptr, the BPF
framework will synchronously invoke this destructor to clean up the old
kptr.
When fput() drops the reference count to zero, it calls
__fput_deferred(). Since in_interrupt() evaluates to true in NMI
context, __fput_deferred() bypasses task_work_add() and instead calls
schedule_delayed_work().
This leads to the following call chain:
bpf_file_release_dtor()
fput()
__fput_deferred()
schedule_delayed_work()
add_timer()
The add_timer() function attempts to acquire the timer_base->lock raw
spinlock. If the NMI interrupted a thread that was already holding this
lock, the NMI handler will spin indefinitely waiting for it, resulting
in a hard deadlock.
Would it be safer to defer the fput() operation using an explicitly
NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
contexts?
> +
> +CFI_NOSEAL(bpf_file_release_dtor);
> +
> #ifdef CONFIG_CGROUPS
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260424192252.695240-1-utilityemal77@gmail.com?part=1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 19:59 ` sashiko-bot
@ 2026-04-24 20:12 ` Justin Suess
2026-04-24 22:05 ` Alexei Starovoitov
0 siblings, 1 reply; 10+ messages in thread
From: Justin Suess @ 2026-04-24 20:12 UTC (permalink / raw)
To: sashiko; +Cc: bpf
On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 1 potential issue(s):
> - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> --
>
> commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> Author: Justin Suess <utilityemal77@gmail.com>
>
> bpf: Implement dtor for struct file BTF ID
>
> This commit implements a destructor for struct file, allowing it to be
> stored in a map as a referenced kptr. The destructor calls fput to
> release the reference.
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> [ ... ]
> > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > }
> > CFI_NOSEAL(bpf_task_release_dtor);
> >
> > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > +{
> > + fput((struct file *)file);
> > +}
>
> Can this cause a deadlock if the destructor is invoked from NMI context?
>
> If an NMI-context BPF program (such as a perf event tracing program)
> updates or deletes a map element containing a struct file kptr, the BPF
> framework will synchronously invoke this destructor to clean up the old
> kptr.
>
> When fput() drops the reference count to zero, it calls
> __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> context, __fput_deferred() bypasses task_work_add() and instead calls
> schedule_delayed_work().
>
> This leads to the following call chain:
>
> bpf_file_release_dtor()
> fput()
> __fput_deferred()
> schedule_delayed_work()
> add_timer()
>
> The add_timer() function attempts to acquire the timer_base->lock raw
> spinlock. If the NMI interrupted a thread that was already holding this
> lock, the NMI handler will spin indefinitely waiting for it, resulting
> in a hard deadlock.
>
> Would it be safer to defer the fput() operation using an explicitly
> NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> contexts?
>
For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
I guess the AI isn't gonna understand but just doing irq_work wouldn't
be right either as it would break operation ordering for maps and
fixing it here wouldn't fix the other dtors broken in NMI.
(cgroup/task_struct)
Anyways I think the AI didn't find any other problems and the test issue
is fixed so this should be ready for another look.
[1] : https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> > +
> > +CFI_NOSEAL(bpf_file_release_dtor);
> > +
> > #ifdef CONFIG_CGROUPS
> [ ... ]
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260424192252.695240-1-utilityemal77@gmail.com?part=1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 20:12 ` Justin Suess
@ 2026-04-24 22:05 ` Alexei Starovoitov
2026-04-24 23:20 ` Justin Suess
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2026-04-24 22:05 UTC (permalink / raw)
To: Justin Suess; +Cc: sashiko, bpf
On Fri, Apr 24, 2026 at 1:12 PM Justin Suess <utilityemal77@gmail.com> wrote:
>
> On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> > Sashiko AI review found 1 potential issue(s):
> > - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> > --
> >
> > commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> > Author: Justin Suess <utilityemal77@gmail.com>
> >
> > bpf: Implement dtor for struct file BTF ID
> >
> > This commit implements a destructor for struct file, allowing it to be
> > stored in a map as a referenced kptr. The destructor calls fput to
> > release the reference.
> >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > [ ... ]
> > > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > > }
> > > CFI_NOSEAL(bpf_task_release_dtor);
> > >
> > > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > > +{
> > > + fput((struct file *)file);
> > > +}
> >
> > Can this cause a deadlock if the destructor is invoked from NMI context?
> >
> > If an NMI-context BPF program (such as a perf event tracing program)
> > updates or deletes a map element containing a struct file kptr, the BPF
> > framework will synchronously invoke this destructor to clean up the old
> > kptr.
> >
> > When fput() drops the reference count to zero, it calls
> > __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> > context, __fput_deferred() bypasses task_work_add() and instead calls
> > schedule_delayed_work().
> >
> > This leads to the following call chain:
> >
> > bpf_file_release_dtor()
> > fput()
> > __fput_deferred()
> > schedule_delayed_work()
> > add_timer()
> >
> > The add_timer() function attempts to acquire the timer_base->lock raw
> > spinlock. If the NMI interrupted a thread that was already holding this
> > lock, the NMI handler will spin indefinitely waiting for it, resulting
> > in a hard deadlock.
> >
> > Would it be safer to defer the fput() operation using an explicitly
> > NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> > contexts?
> >
> For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
>
> I guess the AI isn't gonna understand but just doing irq_work wouldn't
> be right either as it would break operation ordering for maps and
> fixing it here wouldn't fix the other dtors broken in NMI.
> (cgroup/task_struct)
Will break operation ordering?
What do you mean?
I feel we should fix things first before being subject
to more of these bugs.
Why cannot we defer to irq_work the whole map element if in_nmi
and call all dtors there?
should be a simple fix.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 22:05 ` Alexei Starovoitov
@ 2026-04-24 23:20 ` Justin Suess
2026-04-25 1:25 ` Alexei Starovoitov
0 siblings, 1 reply; 10+ messages in thread
From: Justin Suess @ 2026-04-24 23:20 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: sashiko, bpf
On Fri, Apr 24, 2026 at 03:05:47PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 24, 2026 at 1:12 PM Justin Suess <utilityemal77@gmail.com> wrote:
> >
> > On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> > > Sashiko AI review found 1 potential issue(s):
> > > - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> > > --
> > >
> > > commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> > > Author: Justin Suess <utilityemal77@gmail.com>
> > >
> > > bpf: Implement dtor for struct file BTF ID
> > >
> > > This commit implements a destructor for struct file, allowing it to be
> > > stored in a map as a referenced kptr. The destructor calls fput to
> > > release the reference.
> > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > [ ... ]
> > > > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > > > }
> > > > CFI_NOSEAL(bpf_task_release_dtor);
> > > >
> > > > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > > > +{
> > > > + fput((struct file *)file);
> > > > +}
> > >
> > > Can this cause a deadlock if the destructor is invoked from NMI context?
> > >
> > > If an NMI-context BPF program (such as a perf event tracing program)
> > > updates or deletes a map element containing a struct file kptr, the BPF
> > > framework will synchronously invoke this destructor to clean up the old
> > > kptr.
> > >
> > > When fput() drops the reference count to zero, it calls
> > > __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> > > context, __fput_deferred() bypasses task_work_add() and instead calls
> > > schedule_delayed_work().
> > >
> > > This leads to the following call chain:
> > >
> > > bpf_file_release_dtor()
> > > fput()
> > > __fput_deferred()
> > > schedule_delayed_work()
> > > add_timer()
> > >
> > > The add_timer() function attempts to acquire the timer_base->lock raw
> > > spinlock. If the NMI interrupted a thread that was already holding this
> > > lock, the NMI handler will spin indefinitely waiting for it, resulting
> > > in a hard deadlock.
> > >
> > > Would it be safer to defer the fput() operation using an explicitly
> > > NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> > > contexts?
> > >
> > For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
> >
> > I guess the AI isn't gonna understand but just doing irq_work wouldn't
> > be right either as it would break operation ordering for maps and
> > fixing it here wouldn't fix the other dtors broken in NMI.
> > (cgroup/task_struct)
>
> Will break operation ordering?
> What do you mean?
>
Basically if we just did irq_work_queue for one map op and then we do it
again for the same irq_work it will just fail because the queue has to
be drained before we can schedule more work.
So ordering can't be:
1. irq_work_queue
2. irq_work_queue (again on same queue)
3. hard interrupt ends
It has to be
1. irq_work_queue (everything)
2. hard interrupt ends.
In the first ordering above, step 2 fails because the queue has a
pending item.
> I feel we should fix things first before being subject
> to more of these bugs.
> Why cannot we defer to irq_work the whole map element if in_nmi
> and call all dtors there?
irq_work won't work if there's pending work already. It will just return
false and not allow you to queue anything until the task is over with.
So it would work for one element, but we'd be stuck and need another
free irq_work to free the next element.
So we'd need an irq work, one for every element that we free.
So either:
1. allocating memory for a new irq_work on the fly, which is an
operation that can fail. So if we're under memory pressure the
element never gets freed.
2. preallocate an irq work for every element ahead of time.
...
I think the solution for this might be to just reject any kfunc or dtor
that isn't explicitly marked NMI safe from any programs that may
run in NMI.
With a kfunc flag that marks it as being nmi-safe. KF_NMI or
similar. And extending the concept of kfunc flags to dtors to handle
them in the verifier as well.
My reasoning is:
1. it's very easy to inadvertently introduce new kfuncs/dtors that would
break under NMI. (there's two examples in upstream, task_struct and
cgroup dtors).
2. There's very few NMI contexts reachable from BPF. I only found three
cases, tracepoints for irq_handler_entry irq_handler_exit and
nmi_handler.
3. If the user really needs to no nmi unsafe stuff, why can't we just have them
call bpf_task_work_schedule_resume_impl which is designated for this
purpose?
Justin
> should be a simple fix.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-24 23:20 ` Justin Suess
@ 2026-04-25 1:25 ` Alexei Starovoitov
2026-04-25 2:17 ` Justin Suess
0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2026-04-25 1:25 UTC (permalink / raw)
To: Justin Suess; +Cc: sashiko, bpf
On Fri, Apr 24, 2026 at 4:20 PM Justin Suess <utilityemal77@gmail.com> wrote:
>
>
> So either:
>
> 1. allocating memory for a new irq_work on the fly, which is an
> operation that can fail. So if we're under memory pressure the
> element never gets freed.
>
> 2. preallocate an irq work for every element ahead of time.
obviously that's not what I was referring to.
irq_work is rarely used directly like that.
The common pattern is:
defer_free()
{
if (llist_add(head + s->offset, &df->objects))
irq_work_queue(&df->work);
but before going there.
Please enumerate "other dtors broken in NMI".
What exactly is broken?
What is the sequence of events?
All map types with kptrs? or particular ones?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
2026-04-25 1:25 ` Alexei Starovoitov
@ 2026-04-25 2:17 ` Justin Suess
0 siblings, 0 replies; 10+ messages in thread
From: Justin Suess @ 2026-04-25 2:17 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: sashiko, bpf
On Fri, Apr 24, 2026 at 06:25:31PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 24, 2026 at 4:20 PM Justin Suess <utilityemal77@gmail.com> wrote:
> >
> >
> > So either:
> >
> > 1. allocating memory for a new irq_work on the fly, which is an
> > operation that can fail. So if we're under memory pressure the
> > element never gets freed.
> >
> > 2. preallocate an irq work for every element ahead of time.
>
> obviously that's not what I was referring to.
> irq_work is rarely used directly like that.
> The common pattern is:
> defer_free()
> {
> if (llist_add(head + s->offset, &df->objects))
> irq_work_queue(&df->work);
>
That's smart. So you can use it like a doorbell pattern and avoid
re-calling it. And avoid locking with llist.
> but before going there.
> Please enumerate "other dtors broken in NMI".
> What exactly is broken?
> What is the sequence of events?
> All map types with kptrs? or particular ones?
Only referenced kptrs.
I only tested hashmaps, not any other map types. But I don't see any
reason why this wouldn't apply to other map types.
Here are the specific dtors that know/suspect are buggy in NMI.
1. bpf_task_struct_release_dtor I confirmed. See below
2. bpf_kfree_skb_dtor can do call_rcu_hurry.
3. bpf_crypto_ctx_release_dtor does call_rcu.
4. bpf_cgroup_release_dtor frees via work queue and takes cgroup_mutex.
bpf_cpumask_release_dtor I think is safe.
I made a reproducer for one of these (task_struct dtor).
I triggered the issue by deleting the last reference to a task_struct kptr within the tp_bpf/nmi_handler prog.
`bpf_map_delete_elem()`
` -> htab_map_delete_elem()`
` -> free_htab_elem()`
` -> bpf_obj_free_fields()`
` -> bpf_task_release_dtor()`
` -> put_task_struct_rcu_user()`
` -> call_rcu()`
Log:
[ 1.358433] ================================
[ 1.358433] WARNING: inconsistent lock state
[ 1.358434] 7.0.0-11169-ge4ef174588b8-dirty #16 Tainted: G OE
[ 1.358435] --------------------------------
[ 1.358436] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[ 1.358436] test_progs/134 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 1.358438] ffff8ae3bbc6f0e8 (&rdp->nocb_lock){....}-{2:2}, at: __call_rcu_common.constprop.0+0x316/0x740
This is done by loading a task_struct referenced kptr into a map and
deleting it later from within a tp_btf:nmi_handler BPF_PROG_TYPE_TRACING
program.
For the task_struct case;
I have a bug report:
https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
With reproducer:
https://gist.githubusercontent.com/RazeLighter777/5539336d79ab1854f9e9550c6dcab118/raw/082f1eeb2dd445936e64dd3a33861764690bde82/task_struct_dtor_deadlock.patch
I didn't make a reproducer for other cases.
Thanks for the tip with llist and irq_work, that blew my mind a bit.
Justin
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-25 2:17 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 19:22 [PATCH bpf-next v3 0/2] Allow storing referenced struct file kptrs in BPF maps Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
2026-04-24 19:52 ` bot+bpf-ci
2026-04-24 19:59 ` sashiko-bot
2026-04-24 20:12 ` Justin Suess
2026-04-24 22:05 ` Alexei Starovoitov
2026-04-24 23:20 ` Justin Suess
2026-04-25 1:25 ` Alexei Starovoitov
2026-04-25 2:17 ` Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add test for map-stored struct file kptrs Justin Suess
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox