From: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>
To: "Justin Suess" <utilityemal77@gmail.com>, <ast@kernel.org>,
<daniel@iogearbox.net>, <andrii@kernel.org>, <eddyz87@gmail.com>,
<memxor@gmail.com>
Cc: <martin.lau@linux.dev>, <song@kernel.org>,
<yonghong.song@linux.dev>, <jolsa@kernel.org>,
<bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi
Date: Wed, 29 Apr 2026 02:37:32 -0700 [thread overview]
Message-ID: <DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com> (raw)
In-Reply-To: <20260428201422.1518903-4-utilityemal77@gmail.com>
On Tue Apr 28, 2026 at 1:14 PM PDT, Justin Suess wrote:
> Defer freeing of referenced kptrs using irq_work queue.
>
> This fixes a deadlock in BPF tracing programs running under NMI.
>
> Each kptr is tagged with an auxiliary data field storing an llist_node
> and a pointer to the object to be freed. These are assembled together
> to form a queue for deletion outside NMI.
>
> Add a field to each data structure capable of holding referenced kptrs
> to store the llist_head, as well as an irq_work struct to the btf kptr
> field to store the task callback.
>
> The llist_nodes are linked in the queue safely, allowing them to be torn
> down once NMI is over.
>
> This irq_work struct is foribly synchronized on btf teardown, enabled by
> the change in btf cleanup code introduced in the previous commit, adding
> the rcu_work teardown.
>
> At dtor time, if the execution is in an nmi context, enqueue the
> referenced kptr nodes in the llist_head and enqueue a job to drain the
> list, calling the respective dtor callback from a safe context.
>
> If running outside nmi, use synchronous dtor path.
>
> This touches arraymap, hashtab, and bpf local storage. It's important to
> note however, that the bpf_local_storage code rejects nmi updates
> already, the code changes in that case are just to accommodate the changes
> to the record extending the kptr.
>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Justin Suess <utilityemal77@gmail.com>
> Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> Signed-off-by: Justin Suess <utilityemal77@gmail.com>
> ---
> include/linux/bpf.h | 69 ++++++++++++
> kernel/bpf/arraymap.c | 36 ++++++-
> kernel/bpf/bpf_local_storage.c | 13 ++-
> kernel/bpf/btf.c | 6 +-
> kernel/bpf/hashtab.c | 181 +++++++++++++++++++++++++++----
> kernel/bpf/syscall.c | 190 +++++++++++++++++++++++++++++++--
> 6 files changed, 456 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 715b6df9c403..037bdadbed96 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -9,6 +9,8 @@
>
> #include <crypto/sha2.h>
> #include <linux/workqueue.h>
> +#include <linux/irq_work.h>
> +#include <linux/llist.h>
> #include <linux/file.h>
> #include <linux/percpu.h>
> #include <linux/err.h>
> @@ -234,6 +236,10 @@ struct btf_field_kptr {
> * program-allocated, dtor is NULL, and __bpf_obj_drop_impl is used
> */
> btf_dtor_kfunc_t dtor;
> + struct irq_work irq_work;
> + struct llist_head irq_work_items;
> + struct llist_head free_list;
> + u32 aux_off;
> u32 btf_id;
> };
This is extreme per-field overhead
500 extra lines to fix a corner case?
That's not what I suggested. Can the whole thing will be
done with _single_ global llist and irq_work? I bet yes.
Think it through and send us prompt for review instead of code.
pw-bot: cr
next prev parent reply other threads:[~2026-04-29 9:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 20:14 [PATCH bpf-next 0/4] bpf: Fix NMI deadlock in referenced kptr destructors Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 1/4] bpf: Limit fields used in btf_record_equal comparisons Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 2/4] bpf: Use rcu_work in BTF teardown Justin Suess
2026-04-29 1:49 ` sashiko-bot
2026-04-28 20:14 ` [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-04-29 2:29 ` sashiko-bot
2026-04-29 9:37 ` Alexei Starovoitov [this message]
2026-04-29 16:21 ` Justin Suess
2026-05-02 14:33 ` Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 4/4] selftests/bpf: Add kptr nmi deadlock reproducer Justin Suess
2026-04-29 3:39 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=jolsa@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=song@kernel.org \
--cc=utilityemal77@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox