From: Justin Suess <utilityemal77@gmail.com>
To: bot+bpf-ci@kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
eddyz87@gmail.com, memxor@gmail.com, martin.lau@linux.dev,
song@kernel.org, yonghong.song@linux.dev, jolsa@kernel.org,
bpf@vger.kernel.org, mykyta.yatsenko5@gmail.com,
alexei.starovoitov@gmail.com, martin.lau@kernel.org,
clm@meta.com, ihor.solodrai@linux.dev
Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI
Date: Thu, 7 May 2026 14:52:15 -0400 [thread overview]
Message-ID: <afzeWzdWQ7eDJ-hP@zenbox> (raw)
In-Reply-To: <77bd1e436e9b8a4679a4e60adf5971f93688ff28ad0a9b16a0adf2b8172dd116@mail.kernel.org>
On Thu, May 07, 2026 at 06:43:29PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>
> [ ... ]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>
> [ ... ]
>
> > diff --git a/kernel/bpf/fixups.c b/kernel/bpf/fixups.c
>
> [ ... ]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>
> [ ... ]
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 3b1f0ba02f61..d34fdb99eb8a 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -7,6 +7,7 @@
> > #include <linux/bpf_trace.h>
> > #include <linux/bpf_lirc.h>
> > #include <linux/bpf_verifier.h>
> > +#include <linux/bpf_mem_alloc.h>
> > #include <linux/bsearch.h>
> > #include <linux/btf.h>
> > #include <linux/hex.h>
> > @@ -19,6 +20,7 @@
> > #include <linux/fdtable.h>
> > #include <linux/file.h>
> > #include <linux/fs.h>
> > +#include <linux/irq_work.h>
> > #include <linux/license.h>
> > #include <linux/filter.h>
> > #include <linux/kernel.h>
> > @@ -42,6 +44,8 @@
> > #include <linux/cookie.h>
> > #include <linux/verification.h>
> >
> > +#include "percpu_freelist.h"
> > +
> > #include <net/netfilter/nf_bpf_link.h>
> > #include <net/netkit.h>
> > #include <net/tcx.h>
> > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock);
> > static DEFINE_IDR(link_idr);
> > static DEFINE_SPINLOCK(link_idr_lock);
> >
> > +struct bpf_dtor_kptr_work {
> > + struct pcpu_freelist_node fnode;
> > + void *obj;
> > + btf_dtor_kfunc_t dtor;
> > +};
> > +
> > +/* Queue pending dtors; the idle pool uses a global pcpu_freelist. */
> > +static struct pcpu_freelist bpf_dtor_kptr_jobs;
> > +static struct pcpu_freelist bpf_dtor_kptr_idle;
> > +/* Keep surplus = total - needed = idle - refs >= 0 so NMI frees never need to allocate. */
> > +static atomic_long_t bpf_dtor_kptr_surplus = ATOMIC_LONG_INIT(0);
>
> An unaddressed review comment from the v2 discussion raises a design
> concern about the complexity of this implementation. Mykyta Yatsenko
> suggested an alternative design where bpf_dtor_kptr_work is stored in the
> kptr map slot itself (a bpf_kptr_box approach) instead of using two lists
> (idle and armed) with 3 atomics and headroom/trimming management.
>
> From the lore discussion at:
> https://lore.kernel.org/bpf/51a054a0-e57f-49dc-9527-36da0535087c@gmail.com/
>
> Mykyta's suggestion was:
>
> "I think this still looks too complex:
> * 2 lists - idle list and armed list
> * 3 atomics, controlling demand/supply
> * headroom/trimming management
>
> The complexity introduced for performance reasons, but I'm not sure if
> the tradeoff is worth it.
>
> What about the next design:
>
> Instead of idle list, store bpf_dtor_kptr_work in the kptr map slot
> itself. Use kmalloc_nolock() to allocate bpf_dtor_kptr_work on the first
> xchg just once per map value, then reuse it across xchg in/out."
>
> While Justin responded with an explanation of ABBA ordering issues, and
> Mykyta provided a more detailed counter-proposal with RCU protection and
> atomic detach/attach, the v3 implementation does not adopt this
> suggestion and still uses the pcpu_freelist approach with a surplus
> counter.
>
> Was there a specific technical reason to continue with the two-list
> approach rather than exploring the bpf_kptr_box alternative, or would it
> be worth revisiting this design choice?
>
This was addressed.
See https://lore.kernel.org/bpf/afy8yl1bleXmO78H@suesslenovo/
storing a pointer to the bpf_kptr_box, or any other pointer that isn't
the original kptr in the map slot would break direct load operations
on ref kptrs in maps requiring patching the program and breaking
bpf assumptions.
> [ ... ]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>
> [ ... ]
>
> This commit fixes a deadlock bug but lacks a Fixes: tag. The bug was
> introduced in commit 14a324f6a67e ("bpf: Wire up freeing of referenced
> kptr") which added bpf_map_free_kptrs() that calls kptr destructors
> inline without checking for NMI context.
>
> When BPF programs attached to tp_btf/nmi_handler delete map entries,
> destructors like bpf_cpumask_release() that take RCU locks are called
> from NMI, causing system deadlock.
>
> Suggested: Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
>
>
I'll add the Fixes: tag in next reroll.
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25513277524
next prev parent reply other threads:[~2026-05-07 18:52 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43 ` bot+bpf-ci
2026-05-07 18:52 ` Justin Suess [this message]
2026-05-07 23:45 ` sashiko-bot
2026-05-10 15:13 ` Justin Suess
2026-05-10 22:38 ` Alexei Starovoitov
2026-05-11 1:49 ` Justin Suess
2026-05-11 15:51 ` Alexei Starovoitov
2026-05-11 16:38 ` Justin Suess
2026-05-11 17:18 ` Alexei Starovoitov
2026-05-11 20:10 ` Kumar Kartikeya Dwivedi
2026-05-12 1:43 ` Justin Suess
2026-05-12 1:46 ` Kumar Kartikeya Dwivedi
2026-05-12 1:55 ` Alexei Starovoitov
2026-05-12 2:03 ` Kumar Kartikeya Dwivedi
2026-05-12 2:10 ` Alexei Starovoitov
2026-05-12 2:13 ` Kumar Kartikeya Dwivedi
2026-05-12 2:07 ` Justin Suess
2026-05-12 2:08 ` Kumar Kartikeya Dwivedi
2026-05-11 19:22 ` Justin Suess
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08 0:03 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afzeWzdWQ7eDJ-hP@zenbox \
--to=utilityemal77@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bot+bpf-ci@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=clm@meta.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=ihor.solodrai@linux.dev \
--cc=jolsa@kernel.org \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=mykyta.yatsenko5@gmail.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.