Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID

Sashiko discussions
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
       [not found] ` <20260424195958.AED4EC19425@smtp.kernel.org>
@ 2026-04-24 20:12   ` Justin Suess
  2026-04-24 22:05     ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Justin Suess @ 2026-04-24 20:12 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> Sashiko AI review found 1 potential issue(s):
> - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> --
> 
> commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> Author: Justin Suess <utilityemal77@gmail.com>
> 
> bpf: Implement dtor for struct file BTF ID
> 
> This commit implements a destructor for struct file, allowing it to be
> stored in a map as a referenced kptr. The destructor calls fput to
> release the reference.
> 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> [ ... ]
> > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> >  }
> >  CFI_NOSEAL(bpf_task_release_dtor);
> >  
> > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > +{
> > +	fput((struct file *)file);
> > +}
> 
> Can this cause a deadlock if the destructor is invoked from NMI context?
> 
> If an NMI-context BPF program (such as a perf event tracing program)
> updates or deletes a map element containing a struct file kptr, the BPF
> framework will synchronously invoke this destructor to clean up the old
> kptr.
> 
> When fput() drops the reference count to zero, it calls
> __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> context, __fput_deferred() bypasses task_work_add() and instead calls
> schedule_delayed_work().
> 
> This leads to the following call chain:
> 
> bpf_file_release_dtor()
>    fput()
>       __fput_deferred()
>          schedule_delayed_work()
>             add_timer()
> 
> The add_timer() function attempts to acquire the timer_base->lock raw
> spinlock. If the NMI interrupted a thread that was already holding this
> lock, the NMI handler will spin indefinitely waiting for it, resulting
> in a hard deadlock.
> 
> Would it be safer to defer the fput() operation using an explicitly
> NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> contexts?
>
For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]

I guess the AI isn't gonna understand but just doing irq_work wouldn't
be right either as it would break operation ordering for maps and
fixing it here wouldn't fix the other dtors broken in NMI.
(cgroup/task_struct)

Anyways I think the AI didn't find any other problems and the test issue
is fixed so this should be ready for another look.

[1] : https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/
> > +
> > +CFI_NOSEAL(bpf_file_release_dtor);
> > +
> >  #ifdef CONFIG_CGROUPS
> [ ... ]
> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260424192252.695240-1-utilityemal77@gmail.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
  2026-04-24 20:12   ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
@ 2026-04-24 22:05     ` Alexei Starovoitov
  2026-04-24 23:20       ` Justin Suess
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2026-04-24 22:05 UTC (permalink / raw)
  To: Justin Suess; +Cc: sashiko, bpf

On Fri, Apr 24, 2026 at 1:12 PM Justin Suess <utilityemal77@gmail.com> wrote:
>
> On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> > Sashiko AI review found 1 potential issue(s):
> > - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> > --
> >
> > commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> > Author: Justin Suess <utilityemal77@gmail.com>
> >
> > bpf: Implement dtor for struct file BTF ID
> >
> > This commit implements a destructor for struct file, allowing it to be
> > stored in a map as a referenced kptr. The destructor calls fput to
> > release the reference.
> >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > [ ... ]
> > > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > >  }
> > >  CFI_NOSEAL(bpf_task_release_dtor);
> > >
> > > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > > +{
> > > +   fput((struct file *)file);
> > > +}
> >
> > Can this cause a deadlock if the destructor is invoked from NMI context?
> >
> > If an NMI-context BPF program (such as a perf event tracing program)
> > updates or deletes a map element containing a struct file kptr, the BPF
> > framework will synchronously invoke this destructor to clean up the old
> > kptr.
> >
> > When fput() drops the reference count to zero, it calls
> > __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> > context, __fput_deferred() bypasses task_work_add() and instead calls
> > schedule_delayed_work().
> >
> > This leads to the following call chain:
> >
> > bpf_file_release_dtor()
> >    fput()
> >       __fput_deferred()
> >          schedule_delayed_work()
> >             add_timer()
> >
> > The add_timer() function attempts to acquire the timer_base->lock raw
> > spinlock. If the NMI interrupted a thread that was already holding this
> > lock, the NMI handler will spin indefinitely waiting for it, resulting
> > in a hard deadlock.
> >
> > Would it be safer to defer the fput() operation using an explicitly
> > NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> > contexts?
> >
> For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
>
> I guess the AI isn't gonna understand but just doing irq_work wouldn't
> be right either as it would break operation ordering for maps and
> fixing it here wouldn't fix the other dtors broken in NMI.
> (cgroup/task_struct)

Will break operation ordering?
What do you mean?

I feel we should fix things first before being subject
to more of these bugs.
Why cannot we defer to irq_work the whole map element if in_nmi
and call all dtors there?
should be a simple fix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
  2026-04-24 22:05     ` Alexei Starovoitov
@ 2026-04-24 23:20       ` Justin Suess
  2026-04-25  1:25         ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Justin Suess @ 2026-04-24 23:20 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: sashiko, bpf

On Fri, Apr 24, 2026 at 03:05:47PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 24, 2026 at 1:12 PM Justin Suess <utilityemal77@gmail.com> wrote:
> >
> > On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> > > Sashiko AI review found 1 potential issue(s):
> > > - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> > > --
> > >
> > > commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> > > Author: Justin Suess <utilityemal77@gmail.com>
> > >
> > > bpf: Implement dtor for struct file BTF ID
> > >
> > > This commit implements a destructor for struct file, allowing it to be
> > > stored in a map as a referenced kptr. The destructor calls fput to
> > > release the reference.
> > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > [ ... ]
> > > > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > > >  }
> > > >  CFI_NOSEAL(bpf_task_release_dtor);
> > > >
> > > > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > > > +{
> > > > +   fput((struct file *)file);
> > > > +}
> > >
> > > Can this cause a deadlock if the destructor is invoked from NMI context?
> > >
> > > If an NMI-context BPF program (such as a perf event tracing program)
> > > updates or deletes a map element containing a struct file kptr, the BPF
> > > framework will synchronously invoke this destructor to clean up the old
> > > kptr.
> > >
> > > When fput() drops the reference count to zero, it calls
> > > __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> > > context, __fput_deferred() bypasses task_work_add() and instead calls
> > > schedule_delayed_work().
> > >
> > > This leads to the following call chain:
> > >
> > > bpf_file_release_dtor()
> > >    fput()
> > >       __fput_deferred()
> > >          schedule_delayed_work()
> > >             add_timer()
> > >
> > > The add_timer() function attempts to acquire the timer_base->lock raw
> > > spinlock. If the NMI interrupted a thread that was already holding this
> > > lock, the NMI handler will spin indefinitely waiting for it, resulting
> > > in a hard deadlock.
> > >
> > > Would it be safer to defer the fput() operation using an explicitly
> > > NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> > > contexts?
> > >
> > For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
> >
> > I guess the AI isn't gonna understand but just doing irq_work wouldn't
> > be right either as it would break operation ordering for maps and
> > fixing it here wouldn't fix the other dtors broken in NMI.
> > (cgroup/task_struct)
> 
> Will break operation ordering?
> What do you mean?
>
Basically if we just did irq_work_queue for one map op and then we do it
again for the same irq_work it will just fail because the queue has to
be drained before we can schedule more work.

So ordering can't be:

1. irq_work_queue
2. irq_work_queue (again on same queue)
3. hard interrupt ends

It has to be

1. irq_work_queue (everything)
2. hard interrupt ends.

In the first ordering above, step 2 fails because the queue has a
pending item.
> I feel we should fix things first before being subject
> to more of these bugs.
> Why cannot we defer to irq_work the whole map element if in_nmi
> and call all dtors there?
irq_work won't work if there's pending work already. It will just return
false and not allow you to queue anything until the task is over with.

So it would work for one element, but we'd be stuck and need another
free irq_work to free the next element.

So we'd need an irq work, one for every element that we free.

So either: 

1. allocating memory for a new irq_work on the fly, which is an
operation that can fail. So if we're under memory pressure the
element never gets freed.

2. preallocate an irq work for every element ahead of time.

...

I think the solution for this might be to just reject any kfunc or dtor
that isn't explicitly marked NMI safe from any programs that may 
run in NMI.

With a kfunc flag that marks it as being nmi-safe. KF_NMI or
similar. And extending the concept of kfunc flags to dtors to handle
them in the verifier as well.

My reasoning is:

1. it's very easy to inadvertently introduce new kfuncs/dtors that would
break under NMI. (there's two examples in upstream, task_struct and
cgroup dtors).

2. There's very few NMI contexts reachable from BPF. I only found three
cases, tracepoints for irq_handler_entry irq_handler_exit and
nmi_handler.

3. If the user really needs to no nmi unsafe stuff, why can't we just have them
call bpf_task_work_schedule_resume_impl which is designated for this
purpose?

Justin
> should be a simple fix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
  2026-04-24 23:20       ` Justin Suess
@ 2026-04-25  1:25         ` Alexei Starovoitov
  2026-04-25  2:17           ` Justin Suess
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2026-04-25  1:25 UTC (permalink / raw)
  To: Justin Suess; +Cc: sashiko, bpf

On Fri, Apr 24, 2026 at 4:20 PM Justin Suess <utilityemal77@gmail.com> wrote:
>
>
> So either:
>
> 1. allocating memory for a new irq_work on the fly, which is an
> operation that can fail. So if we're under memory pressure the
> element never gets freed.
>
> 2. preallocate an irq work for every element ahead of time.

obviously that's not what I was referring to.
irq_work is rarely used directly like that.
The common pattern is:
defer_free()
{
        if (llist_add(head + s->offset, &df->objects))
                irq_work_queue(&df->work);

but before going there.
Please enumerate "other dtors broken in NMI".
What exactly is broken?
What is the sequence of events?
All map types with kptrs? or particular ones?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
  2026-04-25  1:25         ` Alexei Starovoitov
@ 2026-04-25  2:17           ` Justin Suess
  0 siblings, 0 replies; 5+ messages in thread
From: Justin Suess @ 2026-04-25  2:17 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: sashiko, bpf

On Fri, Apr 24, 2026 at 06:25:31PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 24, 2026 at 4:20 PM Justin Suess <utilityemal77@gmail.com> wrote:
> >
> >
> > So either:
> >
> > 1. allocating memory for a new irq_work on the fly, which is an
> > operation that can fail. So if we're under memory pressure the
> > element never gets freed.
> >
> > 2. preallocate an irq work for every element ahead of time.
> 
> obviously that's not what I was referring to.
> irq_work is rarely used directly like that.
> The common pattern is:
> defer_free()
> {
>         if (llist_add(head + s->offset, &df->objects))
>                 irq_work_queue(&df->work);
>
That's smart. So you can use it like a doorbell pattern and avoid
re-calling it. And avoid locking with llist.
> but before going there.
> Please enumerate "other dtors broken in NMI".
> What exactly is broken?
> What is the sequence of events?
> All map types with kptrs? or particular ones?

Only referenced kptrs.

I only tested hashmaps, not any other map types. But I don't see any
reason why this wouldn't apply to other map types.

Here are the specific dtors that know/suspect are buggy in NMI.

1. bpf_task_struct_release_dtor I confirmed. See below

2. bpf_kfree_skb_dtor can do call_rcu_hurry.

3. bpf_crypto_ctx_release_dtor does call_rcu.

4. bpf_cgroup_release_dtor frees via work queue and takes cgroup_mutex.

bpf_cpumask_release_dtor I think is safe.

I made a reproducer for one of these (task_struct dtor).

I triggered the issue by deleting the last reference to a task_struct kptr within the tp_bpf/nmi_handler prog.

`bpf_map_delete_elem()`
`  -> htab_map_delete_elem()`
`  -> free_htab_elem()`
`  -> bpf_obj_free_fields()`
`  -> bpf_task_release_dtor()`
`  -> put_task_struct_rcu_user()`
`  -> call_rcu()`

Log:
  [    1.358433] ================================
  [    1.358433] WARNING: inconsistent lock state
  [    1.358434] 7.0.0-11169-ge4ef174588b8-dirty #16 Tainted: G           OE      
  [    1.358435] --------------------------------
  [    1.358436] inconsistent {INITIAL USE} -> {IN-NMI} usage.
  [    1.358436] test_progs/134 [HC1[1]:SC0[0]:HE0:SE1] takes:
  [    1.358438] ffff8ae3bbc6f0e8 (&rdp->nocb_lock){....}-{2:2}, at: __call_rcu_common.constprop.0+0x316/0x740
  
This is done by loading a task_struct referenced kptr into a map and
deleting it later from within a tp_btf:nmi_handler BPF_PROG_TYPE_TRACING
program.

For the task_struct case;

I have a bug report:
https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/

With reproducer:
https://gist.githubusercontent.com/RazeLighter777/5539336d79ab1854f9e9550c6dcab118/raw/082f1eeb2dd445936e64dd3a33861764690bde82/task_struct_dtor_deadlock.patch

I didn't make a reproducer for other cases.

Thanks for the tip with llist and irq_work, that blew my mind a bit.

Justin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-25  2:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260424192252.695240-2-utilityemal77@gmail.com>
     [not found] ` <20260424195958.AED4EC19425@smtp.kernel.org>
2026-04-24 20:12   ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
2026-04-24 22:05     ` Alexei Starovoitov
2026-04-24 23:20       ` Justin Suess
2026-04-25  1:25         ` Alexei Starovoitov
2026-04-25  2:17           ` Justin Suess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox