From: Justin Suess <utilityemal77@gmail.com>
To: bot+bpf-ci@kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
eddyz87@gmail.com, memxor@gmail.com, martin.lau@linux.dev,
song@kernel.org, yonghong.song@linux.dev, jolsa@kernel.org,
bpf@vger.kernel.org, mykyta.yatsenko5@gmail.com,
alexei.starovoitov@gmail.com, martin.lau@kernel.org,
clm@meta.com, ihor.solodrai@linux.dev
Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI
Date: Thu, 7 May 2026 14:52:15 -0400 [thread overview]
Message-ID: <afzeWzdWQ7eDJ-hP@zenbox> (raw)
In-Reply-To: <77bd1e436e9b8a4679a4e60adf5971f93688ff28ad0a9b16a0adf2b8172dd116@mail.kernel.org>
On Thu, May 07, 2026 at 06:43:29PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>
> [ ... ]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>
> [ ... ]
>
> > diff --git a/kernel/bpf/fixups.c b/kernel/bpf/fixups.c
>
> [ ... ]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>
> [ ... ]
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 3b1f0ba02f61..d34fdb99eb8a 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -7,6 +7,7 @@
> > #include <linux/bpf_trace.h>
> > #include <linux/bpf_lirc.h>
> > #include <linux/bpf_verifier.h>
> > +#include <linux/bpf_mem_alloc.h>
> > #include <linux/bsearch.h>
> > #include <linux/btf.h>
> > #include <linux/hex.h>
> > @@ -19,6 +20,7 @@
> > #include <linux/fdtable.h>
> > #include <linux/file.h>
> > #include <linux/fs.h>
> > +#include <linux/irq_work.h>
> > #include <linux/license.h>
> > #include <linux/filter.h>
> > #include <linux/kernel.h>
> > @@ -42,6 +44,8 @@
> > #include <linux/cookie.h>
> > #include <linux/verification.h>
> >
> > +#include "percpu_freelist.h"
> > +
> > #include <net/netfilter/nf_bpf_link.h>
> > #include <net/netkit.h>
> > #include <net/tcx.h>
> > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock);
> > static DEFINE_IDR(link_idr);
> > static DEFINE_SPINLOCK(link_idr_lock);
> >
> > +struct bpf_dtor_kptr_work {
> > + struct pcpu_freelist_node fnode;
> > + void *obj;
> > + btf_dtor_kfunc_t dtor;
> > +};
> > +
> > +/* Queue pending dtors; the idle pool uses a global pcpu_freelist. */
> > +static struct pcpu_freelist bpf_dtor_kptr_jobs;
> > +static struct pcpu_freelist bpf_dtor_kptr_idle;
> > +/* Keep surplus = total - needed = idle - refs >= 0 so NMI frees never need to allocate. */
> > +static atomic_long_t bpf_dtor_kptr_surplus = ATOMIC_LONG_INIT(0);
>
> An unaddressed review comment from the v2 discussion raises a design
> concern about the complexity of this implementation. Mykyta Yatsenko
> suggested an alternative design where bpf_dtor_kptr_work is stored in the
> kptr map slot itself (a bpf_kptr_box approach) instead of using two lists
> (idle and armed) with 3 atomics and headroom/trimming management.
>
> From the lore discussion at:
> https://lore.kernel.org/bpf/51a054a0-e57f-49dc-9527-36da0535087c@gmail.com/
>
> Mykyta's suggestion was:
>
> "I think this still looks too complex:
> * 2 lists - idle list and armed list
> * 3 atomics, controlling demand/supply
> * headroom/trimming management
>
> The complexity introduced for performance reasons, but I'm not sure if
> the tradeoff is worth it.
>
> What about the next design:
>
> Instead of idle list, store bpf_dtor_kptr_work in the kptr map slot
> itself. Use kmalloc_nolock() to allocate bpf_dtor_kptr_work on the first
> xchg just once per map value, then reuse it across xchg in/out."
>
> While Justin responded with an explanation of ABBA ordering issues, and
> Mykyta provided a more detailed counter-proposal with RCU protection and
> atomic detach/attach, the v3 implementation does not adopt this
> suggestion and still uses the pcpu_freelist approach with a surplus
> counter.
>
> Was there a specific technical reason to continue with the two-list
> approach rather than exploring the bpf_kptr_box alternative, or would it
> be worth revisiting this design choice?
>
This was addressed.
See https://lore.kernel.org/bpf/afy8yl1bleXmO78H@suesslenovo/
storing a pointer to the bpf_kptr_box, or any other pointer that isn't
the original kptr in the map slot would break direct load operations
on ref kptrs in maps requiring patching the program and breaking
bpf assumptions.
> [ ... ]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>
> [ ... ]
>
> This commit fixes a deadlock bug but lacks a Fixes: tag. The bug was
> introduced in commit 14a324f6a67e ("bpf: Wire up freeing of referenced
> kptr") which added bpf_map_free_kptrs() that calls kptr destructors
> inline without checking for NMI context.
>
> When BPF programs attached to tp_btf/nmi_handler delete map entries,
> destructors like bpf_cpumask_release() that take RCU locks are called
> from NMI, causing system deadlock.
>
> Suggested: Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
>
>
I'll add the Fixes: tag in next reroll.
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25513277524
next prev parent reply other threads:[~2026-05-07 18:52 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43 ` bot+bpf-ci
2026-05-07 18:52 ` Justin Suess [this message]
2026-05-07 23:45 ` sashiko-bot
2026-05-10 15:13 ` Justin Suess
2026-05-10 22:38 ` Alexei Starovoitov
2026-05-11 1:49 ` Justin Suess
2026-05-11 15:51 ` Alexei Starovoitov
2026-05-11 16:38 ` Justin Suess
2026-05-11 17:18 ` Alexei Starovoitov
2026-05-11 20:10 ` Kumar Kartikeya Dwivedi
2026-05-12 1:43 ` Justin Suess
2026-05-12 1:46 ` Kumar Kartikeya Dwivedi
2026-05-12 1:55 ` Alexei Starovoitov
2026-05-12 2:03 ` Kumar Kartikeya Dwivedi
2026-05-12 2:10 ` Alexei Starovoitov
2026-05-12 2:13 ` Kumar Kartikeya Dwivedi
2026-05-12 2:07 ` Justin Suess
2026-05-12 2:08 ` Kumar Kartikeya Dwivedi
2026-05-11 19:22 ` Justin Suess
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08 0:03 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afzeWzdWQ7eDJ-hP@zenbox \
--to=utilityemal77@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bot+bpf-ci@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=clm@meta.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=ihor.solodrai@linux.dev \
--cc=jolsa@kernel.org \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=mykyta.yatsenko5@gmail.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox