BPF List
 help / color / mirror / Atom feed
From: Justin Suess <utilityemal77@gmail.com>
To: sashiko@lists.linux.dev
Cc: bpf@vger.kernel.org
Subject: Re: [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI
Date: Sun, 10 May 2026 11:13:36 -0400	[thread overview]
Message-ID: <agCXEJKJh-JGMhjG@zenbox> (raw)
In-Reply-To: <20260507234520.646C4C2BCB2@smtp.kernel.org>

On Thu, May 07, 2026 at 11:45:19PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] TOCTOU race in `bpf_dtor_kptr_release_one()` can lead to NMI deadlocks.
> - [Low] Memory leak on initialization failure in `bpf_dtor_kptr_init()`.
> --
> 
> commit 488cdc390b90427a591d47211600c09addc95fcd
> Author: Justin Suess <utilityemal77@gmail.com>
> Date:   Thu May 7 17:54:52 2026 +0000
> 
>     bpf: Offload kptr destructors that run from NMI
> 
> This commit queues destructor-backed kptr teardowns to irq_work when
> running from NMI context. It maintains an idle-slot surplus counter to
> ensure offload slots are preallocated to prevent deadlocks.
> 
> Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 3b1f0ba02f613..d34fdb99eb8a9 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> [ ... ]
> > @@ -65,6 +69,111 @@ static DEFINE_SPINLOCK(map_idr_lock);
> > +static void bpf_dtor_kptr_release_one(void)
> > +{
> > +	struct bpf_dtor_kptr_work *job;
> > +	long surplus;
> > +
> > +	for (;;) {
> > +		surplus = atomic_long_read(&bpf_dtor_kptr_surplus);
> > +		if (surplus <= 0)
> > +			return;
> > +
> > +		job = bpf_dtor_kptr_pop_idle();
> 
> Could this sequence lead to a race condition that drains the idle pool
> too early?
> 
> If multiple threads read a positive surplus value here, they might all call
> bpf_dtor_kptr_pop_idle() before the cmpxchg executes. This could temporarily
> drain the idle pool below the required number of reserved slots.
> 
> If an NMI fires during this window and attempts to tear down a kptr, would
> bpf_dtor_kptr_offload() find the idle pool empty? If so, it looks like it
> would trigger the fallback warning and run the destructor inline from the
> NMI context, which might reintroduce the deadlocks this patch aims to
> prevent.
>
Yes this is real.

But there's a bigger problem: the push side of pcpu_freelist is not
interruptible.

If we jump to NMI during this stage then the pcpu_freelist will be
locked up because we were pushing to it, thus preventing us from
popping. Thus we just deadlock again.

I think we technically have exclusivity here on our percpu head
(we can't be in NMI and process context stimultaniously on the cpu).

So maybe it is safe to forcibly acquire that specific lock? I'm not 100%
sure.

llist won't work since we can't have multiple llist_del_first consumers,
and llist_del_all would render the list null and unusable if we jump to
it from NMI while we're borrowing it. 

Making percpu llists won't work either, since the cpu we allocate the
job one might not be the one the dtor runs on, so we need percpu
stealing, which then runs into the same multiple consumers problem if
threads steal from same cpu head.

So this code is not ready either way.

I think 128 bit cmpxchg could technically solve this problem, but don't
think we can rely on it for all architectures BPF supports.

If anyone has any guidance on this it would be much appreciated. Been
stuck on this for several weeks...

Construct I need is a data structure with atomic push, pop, and move
between sets. No specific ordering needed, but updates from producers 
must be immediately visible to all consumers. I've looked into Treiber
stacks.

Any help or guidance on this would be appreciated!
> > +		if (!job)
> > +			return;
> > +
> > +		if (!atomic_long_try_cmpxchg(&bpf_dtor_kptr_surplus, &surplus,
> > +						     surplus - 1)) {
> > +			pcpu_freelist_push(&bpf_dtor_kptr_idle, &job->fnode);
> > +			continue;
> > +		}
> > +
> > +		bpf_mem_free(&bpf_global_ma, job);
> > +		return;
> > +	}
> > +}
> [ ... ]
> > +static int __init bpf_dtor_kptr_init(void)
> > +{
> > +	int err;
> > +
> > +	err = pcpu_freelist_init(&bpf_dtor_kptr_idle);
> > +	if (err)
> > +		return err;
> > +
> > +	err = pcpu_freelist_init(&bpf_dtor_kptr_jobs);
> > +	if (err)
> > +		return err;
> 
> Does this error path leak the per-cpu memory allocated for the idle list?
> 
> If the second pcpu_freelist_init() fails, the function returns immediately.
> Should there be a cleanup path here to free the already initialized
> bpf_dtor_kptr_idle list?
> 
> > +
> > +	return 0;
> > +}
> > +late_initcall(bpf_dtor_kptr_init);
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260507175453.1140400-1-utilityemal77@gmail.com?part=1

  reply	other threads:[~2026-05-10 15:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07 17:54 [bpf-next v3 0/2] bpf: Fix deadlock in kptr dtor in nmi Justin Suess
2026-05-07 17:54 ` [bpf-next v3 1/2] bpf: Offload kptr destructors that run from NMI Justin Suess
2026-05-07 18:43   ` bot+bpf-ci
2026-05-07 18:52     ` Justin Suess
2026-05-07 23:45   ` sashiko-bot
2026-05-10 15:13     ` Justin Suess [this message]
2026-05-10 22:38       ` Alexei Starovoitov
2026-05-11  1:49         ` Justin Suess
2026-05-11 15:51           ` Alexei Starovoitov
2026-05-11 16:38             ` Justin Suess
2026-05-11 17:18               ` Alexei Starovoitov
2026-05-11 20:10                 ` Kumar Kartikeya Dwivedi
2026-05-12  1:43                   ` Justin Suess
2026-05-12  1:46                     ` Kumar Kartikeya Dwivedi
2026-05-12  1:55                       ` Alexei Starovoitov
2026-05-12  2:03                         ` Kumar Kartikeya Dwivedi
2026-05-12  2:10                           ` Alexei Starovoitov
2026-05-12  2:13                             ` Kumar Kartikeya Dwivedi
2026-05-12  2:07                         ` Justin Suess
2026-05-12  2:08                           ` Kumar Kartikeya Dwivedi
2026-05-11 19:22             ` Justin Suess
2026-05-07 17:54 ` [bpf-next v3 2/2] selftests/bpf: Add kptr destructor NMI exerciser Justin Suess
2026-05-08  0:03   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agCXEJKJh-JGMhjG@zenbox \
    --to=utilityemal77@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox