BPF List
 help / color / mirror / Atom feed
From: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>
To: "Justin Suess" <utilityemal77@gmail.com>, <ast@kernel.org>,
	<daniel@iogearbox.net>, <andrii@kernel.org>, <eddyz87@gmail.com>,
	<memxor@gmail.com>
Cc: <martin.lau@linux.dev>, <song@kernel.org>,
	<yonghong.song@linux.dev>, <jolsa@kernel.org>,
	<bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi
Date: Wed, 29 Apr 2026 02:37:32 -0700	[thread overview]
Message-ID: <DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com> (raw)
In-Reply-To: <20260428201422.1518903-4-utilityemal77@gmail.com>

On Tue Apr 28, 2026 at 1:14 PM PDT, Justin Suess wrote:
> Defer freeing of referenced kptrs using irq_work queue.
>
> This fixes a deadlock in BPF tracing programs running under NMI.
>
> Each kptr is tagged with an auxiliary data field storing an llist_node
> and a pointer to the object to be freed. These are assembled together
> to form a queue for deletion outside NMI.
>
> Add a field to each data structure capable of holding referenced kptrs
> to store the llist_head, as well as an irq_work struct to the btf kptr
> field to store the task callback.
>
> The llist_nodes are linked in the queue safely, allowing them to be torn
> down once NMI is over.
>
> This irq_work struct is foribly synchronized on btf teardown, enabled by
> the change in btf cleanup code introduced in the previous commit, adding
> the rcu_work teardown.
>
> At dtor time, if the execution is in an nmi context, enqueue the
> referenced kptr nodes in the llist_head and enqueue a job to drain the
> list, calling the respective dtor callback from a safe context.
>
> If running outside nmi, use synchronous dtor path.
>
> This touches arraymap, hashtab, and bpf local storage. It's important to
> note however, that the bpf_local_storage code rejects nmi updates
> already, the code changes in that case are just to accommodate the changes
> to the record extending the kptr.
>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Justin Suess <utilityemal77@gmail.com>
> Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> Signed-off-by: Justin Suess <utilityemal77@gmail.com>
> ---
>  include/linux/bpf.h            |  69 ++++++++++++
>  kernel/bpf/arraymap.c          |  36 ++++++-
>  kernel/bpf/bpf_local_storage.c |  13 ++-
>  kernel/bpf/btf.c               |   6 +-
>  kernel/bpf/hashtab.c           | 181 +++++++++++++++++++++++++++----
>  kernel/bpf/syscall.c           | 190 +++++++++++++++++++++++++++++++--
>  6 files changed, 456 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 715b6df9c403..037bdadbed96 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -9,6 +9,8 @@
>  
>  #include <crypto/sha2.h>
>  #include <linux/workqueue.h>
> +#include <linux/irq_work.h>
> +#include <linux/llist.h>
>  #include <linux/file.h>
>  #include <linux/percpu.h>
>  #include <linux/err.h>
> @@ -234,6 +236,10 @@ struct btf_field_kptr {
>  	 * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl is used
>  	 */
>  	btf_dtor_kfunc_t dtor;
> +	struct irq_work irq_work;
> +	struct llist_head irq_work_items;
> +	struct llist_head free_list;
> +	u32 aux_off;
>  	u32 btf_id;
>  };

This is extreme per-field overhead
500 extra lines to fix a corner case?
That's not what I suggested. Can the whole thing will be
done with _single_ global llist and irq_work? I bet yes.
Think it through and send us prompt for review instead of code.

pw-bot: cr


  parent reply	other threads:[~2026-04-29  9:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 20:14 [PATCH bpf-next 0/4] bpf: Fix NMI deadlock in referenced kptr destructors Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 1/4] bpf: Limit fields used in btf_record_equal comparisons Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 2/4] bpf: Use rcu_work in BTF teardown Justin Suess
2026-04-29  1:49   ` sashiko-bot
2026-04-28 20:14 ` [PATCH bpf-next 3/4] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-04-29  2:29   ` sashiko-bot
2026-04-29  9:37   ` Alexei Starovoitov [this message]
2026-04-29 16:21     ` Justin Suess
2026-05-02 14:33       ` Justin Suess
2026-04-28 20:14 ` [PATCH bpf-next 4/4] selftests/bpf: Add kptr nmi deadlock reproducer Justin Suess
2026-04-29  3:39   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DI5JDE24ICIE.1PAT8HWYK9ENS@gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=song@kernel.org \
    --cc=utilityemal77@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox