public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Justin Suess <utilityemal77@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, eddyz87@gmail.com, martin.lau@linux.dev,
	yonghong.song@linux.dev, jolsa@kernel.org
Subject: Re: [BUG] bpf: Soft lockup / panic triggered by bpf_task_release_dtor from NMI on rcu_nocbs CPU
Date: Wed, 22 Apr 2026 10:39:03 -0400	[thread overview]
Message-ID: <aejdhxJ4ev3mxE--@suesslenovo> (raw)
In-Reply-To: <CAP01T77626uOQLrxkiMRDE_-VyMO=BDEYYRNEW1fgkj9yRv1NQ@mail.gmail.com>

On Tue, Apr 21, 2026 at 11:44:42PM +0200, Kumar Kartikeya Dwivedi wrote:
> On Tue, 21 Apr 2026 at 23:34, Justin Suess <utilityemal77@gmail.com> wrote:
> >
> > On Tue, Apr 21, 2026 at 10:23:56PM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Tue, 21 Apr 2026 at 22:10, Justin Suess <utilityemal77@gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I found a reproducible soft lockup / panic involving BPF task kptr destruction from NMI context.
> > > >
> > > > It was found after further investigation from a Sashiko report on my patch:
> > > > https://lore.kernel.org/bpf/20260420203306.3107246-1-utilityemal77@gmail.com/T/#t
> > > >
> > > > The issue is reproducible with a BPF selftest-derived reproducer that:
> > > >
> > > > 1. Stores exited task references in a BPF hash map as refcounted task kptrs.
> > > > 2. Deletes those kptrs from a `tp_btf/nmi_handler` program.
> > > > 3. Runs on an `rcu_nocbs` CPU.
> > > >
> > > > In my setup this eventually triggers a soft lockup and panic in a workqueue thread stuck in:
> > > >
> > > > `perf_sched_delayed`
> > > > `  -> static_key_disable()`
> > > > `  -> arch_jump_label_transform_apply()`
> > > > `  -> smp_text_poke_batch_finish()`
> > > > `  -> on_each_cpu_cond_mask()`
> > > > `  -> smp_call_function_many_cond()`
> > > >
> > > > The triggering condition appears to be that `bpf_task_release_dtor()` can run in NMI context and reach the last-ref `put_task_struct_rcu_user()` path on an offloaded RCU callback CPU.
> > > >
> > > > Affected code path is a dtor triggered by deleting the last reference to a task_struct kptr:
> > > >
> > > > `bpf_map_delete_elem()`
> > > > `  -> htab_map_delete_elem()`
> > > > `  -> free_htab_elem()`
> > > > `  -> bpf_obj_free_fields()`
> > > > `  -> bpf_task_release_dtor()`
> > > > `  -> put_task_struct_rcu_user()`
> > > > `  -> call_rcu()`
> > > >
> > > > This is triggered from:
> > > >
> > > > `tp_btf/nmi_handler`
> > > > `  -> clear_task_kptrs_from_nmi` (reproducer bpf prog)
> > > >
> > > > Environment
> > > >
> > > > - x86_64 QEMU VM
> > > > - PREEMPT(full)
> > > > - `CONFIG_RCU_EXPERT=y`
> > > > - `CONFIG_RCU_NOCB_CPU=y`
> > > > - booted with `rcu_nocbs=1-7`
> > > >
> > > > [...]
> > >
> > > Makes sense. I think the reasonable path is to just close usage in the
> > > NMI context, otherwise we must address each case. Could you try the
> > > attached diff and let me know if it successfully rejects kptr usage
> > > here? Thanks.
> > Didn't work for me.
> >
> > is_tracing_prog_type, despite the name, does not return true
> > for BPF_PROG_TYPE_TRACING. Only BPF_PROG_TYPE_TRACEPOINT.
> >
> 
> We can add it but return false when expected_attach_type ==
> BPF_TRACE_ITER. For all other cases, allowing it doesn't make sense
> because these might potentially run in NMI context.
> 
> Please let me know if you'd like to send a fix + tests, otherwise I
> can follow up. Feel free to fold in the diff I sent into your fix, no
> attributation needed.
> 
> > I'm honestly still not sure what the difference is, but they are
> > different [1]
> >
> > Would you rather do this or just reject the dtors with a
> > kfunc filter for this program type?
> >
> > Or teach the verifier that the kptr ops need to be offloaded with
> > bpf_task_work_schedule_resume_impl?
> >
> > [1]: https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_TRACING/

Sorry for the double tap but the change you're requesting for the fix
will cause breakage.

This will at a minimum break test_bpf_ma and percpu_alloc_array tests.

More importantly, this will break existing progs that use kptrs in
tracepoints.

Would a narrower fix that filters the dtor kfuncs specifically be a
better option? Or better fix the kfuncs that use irq_work? I think
the real fix is to make bpf smarter about when it's running under
nmi, but that may be non-trivial.

If the breakage is acceptable, should I just remove those tests?

Justin

  parent reply	other threads:[~2026-04-22 14:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 20:10 [BUG] bpf: Soft lockup / panic triggered by bpf_task_release_dtor from NMI on rcu_nocbs CPU Justin Suess
2026-04-21 20:23 ` Kumar Kartikeya Dwivedi
2026-04-21 21:34   ` Justin Suess
2026-04-21 21:44     ` Kumar Kartikeya Dwivedi
2026-04-22 11:58       ` Justin Suess
2026-04-22 14:39       ` Justin Suess [this message]
2026-04-22 20:47         ` Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aejdhxJ4ev3mxE--@suesslenovo \
    --to=utilityemal77@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox