Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI

BPF List
 help / color / mirror / Atom feed

From: Justin Suess <utilityemal77@gmail.com>
To: bot+bpf-ci@kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	 eddyz87@gmail.com, memxor@gmail.com, martin.lau@linux.dev,
	song@kernel.org,  yonghong.song@linux.dev, jolsa@kernel.org,
	bpf@vger.kernel.org,  mykyta.yatsenko5@gmail.com,
	alexei.starovoitov@gmail.com, martin.lau@kernel.org,
	 clm@meta.com, ihor.solodrai@linux.dev
Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI
Date: Thu, 7 May 2026 14:52:15 -0400	[thread overview]
Message-ID: <afzeWzdWQ7eDJ-hP@zenbox> (raw)
In-Reply-To: <77bd1e436e9b8a4679a4e60adf5971f93688ff28ad0a9b16a0adf2b8172dd116@mail.kernel.org>

On Thu, May 07, 2026 at 06:43:29PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> 
> [ ... ]
> 
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> 
> [ ... ]
> 
> > diff --git a/kernel/bpf/fixups.c b/kernel/bpf/fixups.c
> 
> [ ... ]
> 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> 
> [ ... ]
> 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 3b1f0ba02f61..d34fdb99eb8a 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -7,6 +7,7 @@
> >  #include <linux/bpf_trace.h>
> >  #include <linux/bpf_lirc.h>
> >  #include <linux/bpf_verifier.h>
> > +#include <linux/bpf_mem_alloc.h>
> >  #include <linux/bsearch.h>
> >  #include <linux/btf.h>
> >  #include <linux/hex.h>
> > @@ -19,6 +20,7 @@
> >  #include <linux/fdtable.h>
> >  #include <linux/file.h>
> >  #include <linux/fs.h>
> > +#include <linux/irq_work.h>
> >  #include <linux/license.h>
> >  #include <linux/filter.h>
> >  #include <linux/kernel.h>
> > @@ -42,6 +44,8 @@
> >  #include <linux/cookie.h>
> >  #include <linux/verification.h>
> >
> > +#include "percpu_freelist.h"
> > +
> >  #include <net/netfilter/nf_bpf_link.h>
> >  #include <net/netkit.h>
> >  #include <net/tcx.h>
> > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock);
> >  static DEFINE_IDR(link_idr);
> >  static DEFINE_SPINLOCK(link_idr_lock);
> >
> > +struct bpf_dtor_kptr_work {
> > +	struct pcpu_freelist_node fnode;
> > +	void *obj;
> > +	btf_dtor_kfunc_t dtor;
> > +};
> > +
> > +/* Queue pending dtors; the idle pool uses a global pcpu_freelist. */
> > +static struct pcpu_freelist bpf_dtor_kptr_jobs;
> > +static struct pcpu_freelist bpf_dtor_kptr_idle;
> > +/* Keep surplus = total - needed = idle - refs >= 0 so NMI frees never need to allocate. */
> > +static atomic_long_t bpf_dtor_kptr_surplus = ATOMIC_LONG_INIT(0);
> 
> An unaddressed review comment from the v2 discussion raises a design
> concern about the complexity of this implementation. Mykyta Yatsenko
> suggested an alternative design where bpf_dtor_kptr_work is stored in the
> kptr map slot itself (a bpf_kptr_box approach) instead of using two lists
> (idle and armed) with 3 atomics and headroom/trimming management.
> 
> From the lore discussion at:
> https://lore.kernel.org/bpf/51a054a0-e57f-49dc-9527-36da0535087c@gmail.com/
> 
> Mykyta's suggestion was:
> 
> "I think this still looks too complex:
>  * 2 lists - idle list and armed list
>  * 3 atomics, controlling demand/supply
>  * headroom/trimming management
> 
> The complexity introduced for performance reasons, but I'm not sure if
> the tradeoff is worth it.
> 
> What about the next design:
> 
> Instead of idle list, store bpf_dtor_kptr_work in the kptr map slot
> itself. Use kmalloc_nolock() to allocate bpf_dtor_kptr_work on the first
> xchg just once per map value, then reuse it across xchg in/out."
> 
> While Justin responded with an explanation of ABBA ordering issues, and
> Mykyta provided a more detailed counter-proposal with RCU protection and
> atomic detach/attach, the v3 implementation does not adopt this
> suggestion and still uses the pcpu_freelist approach with a surplus
> counter.
> 
> Was there a specific technical reason to continue with the two-list
> approach rather than exploring the bpf_kptr_box alternative, or would it
> be worth revisiting this design choice?
> 
This was addressed.

See https://lore.kernel.org/bpf/afy8yl1bleXmO78H@suesslenovo/

storing a pointer to the bpf_kptr_box, or any other pointer that isn't
the original kptr in the map slot would break direct load operations
on ref kptrs in maps requiring patching the program and breaking
bpf assumptions.

> [ ... ]
> 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> 
> [ ... ]
> 
> This commit fixes a deadlock bug but lacks a Fixes: tag. The bug was
> introduced in commit 14a324f6a67e ("bpf: Wire up freeing of referenced
> kptr") which added bpf_map_free_kptrs() that calls kptr destructors
> inline without checking for NMI context.
> 
> When BPF programs attached to tp_btf/nmi_handler delete map entries,
> destructors like bpf_cpumask_release() that take RCU locks are called
> from NMI, causing system deadlock.
> 
> Suggested: Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
> 
>
I'll add the Fixes: tag in next reroll.
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25513277524

next prev parent reply	other threads:[~2026-05-07 18:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43   ` bot+bpf-ci
2026-05-07 18:52     ` Justin Suess [this message]
2026-05-07 23:45   ` sashiko-bot
2026-05-10 15:13     ` Justin Suess
2026-05-10 22:38       ` Alexei Starovoitov
2026-05-11  1:49         ` Justin Suess
2026-05-11 15:51           ` Alexei Starovoitov
2026-05-11 16:38             ` Justin Suess
2026-05-11 17:18               ` Alexei Starovoitov
2026-05-11 20:10                 ` Kumar Kartikeya Dwivedi
2026-05-12  1:43                   ` Justin Suess
2026-05-12  1:46                     ` Kumar Kartikeya Dwivedi
2026-05-12  1:55                       ` Alexei Starovoitov
2026-05-12  2:03                         ` Kumar Kartikeya Dwivedi
2026-05-12  2:10                           ` Alexei Starovoitov
2026-05-12  2:13                             ` Kumar Kartikeya Dwivedi
2026-05-12  2:07                         ` Justin Suess
2026-05-12  2:08                           ` Kumar Kartikeya Dwivedi
2026-05-11 19:22             ` Justin Suess
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08  0:03   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afzeWzdWQ7eDJ-hP@zenbox \
    --to=utilityemal77@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bot+bpf-ci@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clm@meta.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=jolsa@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=mykyta.yatsenko5@gmail.com \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox