From: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>
To: "Justin Suess" <utilityemal77@gmail.com>, <ast@kernel.org>,
<daniel@iogearbox.net>, <andrii@kernel.org>, <eddyz87@gmail.com>,
<memxor@gmail.com>
Cc: <martin.lau@linux.dev>, <song@kernel.org>,
<yonghong.song@linux.dev>, <jolsa@kernel.org>,
<bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi
Date: Wed, 29 Apr 2026 02:37:32 -0700 [thread overview]
Message-ID: <DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com> (raw)
In-Reply-To: <20260428201422.1518903-4-utilityemal77@gmail.com>
On Tue Apr 28, 2026 at 1:14 PM PDT, Justin Suess wrote:
> Defer freeing of referenced kptrs using irq_work queue.
>
> This fixes a deadlock in BPF tracing programs running under NMI.
>
> Each kptr is tagged with an auxiliary data field storing an llist_node
> and a pointer to the object to be freed. These are assembled together
> to form a queue for deletion outside NMI.
>
> Add a field to each data structure capable of holding referenced kptrs
> to store the llist_head, as well as an irq_work struct to the btf kptr
> field to store the task callback.
>
> The llist_nodes are linked in the queue safely, allowing them to be torn
> down once NMI is over.
>
> This irq_work struct is foribly synchronized on btf teardown, enabled by
> the change in btf cleanup code introduced in the previous commit, adding
> the rcu_work teardown.
>
> At dtor time, if the execution is in an nmi context, enqueue the
> referenced kptr nodes in the llist_head and enqueue a job to drain the
> list, calling the respective dtor callback from a safe context.
>
> If running outside nmi, use synchronous dtor path.
>
> This touches arraymap, hashtab, and bpf local storage. It's important to
> note however, that the bpf_local_storage code rejects nmi updates
> already, the code changes in that case are just to accommodate the changes
> to the record extending the kptr.
>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Justin Suess <utilityemal77@gmail.com>
> Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> Signed-off-by: Justin Suess <utilityemal77@gmail.com>
> ---
> include/linux/bpf.h | 69 ++++++++++++
> kernel/bpf/arraymap.c | 36 ++++++-
> kernel/bpf/bpf_local_storage.c | 13 ++-
> kernel/bpf/btf.c | 6 +-
> kernel/bpf/hashtab.c | 181 +++++++++++++++++++++++++++----
> kernel/bpf/syscall.c | 190 +++++++++++++++++++++++++++++++--
> 6 files changed, 456 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 715b6df9c403..037bdadbed96 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -9,6 +9,8 @@
>
> #include <crypto/sha2.h>
> #include <linux/workqueue.h>
> +#include <linux/irq_work.h>
> +#include <linux/llist.h>
> #include <linux/file.h>
> #include <linux/percpu.h>
> #include <linux/err.h>
> @@ -234,6 +236,10 @@ struct btf_field_kptr {
> * program-allocated, dtor is NULL, and __bpf_obj_drop_impl is used
> */
> btf_dtor_kfunc_t dtor;
> + struct irq_work irq_work;
> + struct llist_head irq_work_items;
> + struct llist_head free_list;
> + u32 aux_off;
> u32 btf_id;
> };
This is extreme per-field overhead
500 extra lines to fix a corner case?
That's not what I suggested. Can the whole thing will be
done with _single_ global llist and irq_work? I bet yes.
Think it through and send us prompt for review instead of code.
pw-bot: cr
next prev parent reply other threads:[~2026-04-29 9:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 20:14 [PATCH bpf-next 0/4] bpf: Fix NMI deadlock in referenced kptr destructors Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 1/4] bpf: Limit fields used in btf_record_equal comparisons Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 2/4] bpf: Use rcu_work in BTF teardown Justin Suess
2026-04-29 1:49 ` sashiko-bot
2026-05-20 5:37 ` kernel test robot
2026-04-28 20:14 ` [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-04-29 2:29 ` sashiko-bot
2026-04-29 9:37 ` Alexei Starovoitov [this message]
2026-04-29 16:21 ` Justin Suess
2026-05-02 14:33 ` Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 4/4] selftests/bpf: Add kptr nmi deadlock reproducer Justin Suess
2026-04-29 3:39 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=jolsa@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=song@kernel.org \
--cc=utilityemal77@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.