public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Justin Suess <utilityemal77@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: sashiko@lists.linux.dev, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID
Date: Fri, 24 Apr 2026 19:20:45 -0400	[thread overview]
Message-ID: <aev6zU2vj6wmaRMG@zenbox> (raw)
In-Reply-To: <CAADnVQKFtuX48V9Yy88GHGVbQyY30M3b3YBvZ2PmYt_-jbohpQ@mail.gmail.com>

On Fri, Apr 24, 2026 at 03:05:47PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 24, 2026 at 1:12 PM Justin Suess <utilityemal77@gmail.com> wrote:
> >
> > On Fri, Apr 24, 2026 at 07:59:57PM +0000, sashiko-bot@kernel.org wrote:
> > > Sashiko AI review found 1 potential issue(s):
> > > - [High] Calling `fput()` directly from `bpf_file_release_dtor` can lead to a hard deadlock when invoked from NMI context.
> > > --
> > >
> > > commit 6368f91be1fd91bc9cb38d3498f35d279bad04c9
> > > Author: Justin Suess <utilityemal77@gmail.com>
> > >
> > > bpf: Implement dtor for struct file BTF ID
> > >
> > > This commit implements a destructor for struct file, allowing it to be
> > > stored in a map as a referenced kptr. The destructor calls fput to
> > > release the reference.
> > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 2bb60200c2665e..ee5ceb82e4f812 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > [ ... ]
> > > > @@ -2752,6 +2753,13 @@ __bpf_kfunc void bpf_task_release_dtor(void *p)
> > > >  }
> > > >  CFI_NOSEAL(bpf_task_release_dtor);
> > > >
> > > > +__bpf_kfunc void bpf_file_release_dtor(void *file)
> > > > +{
> > > > +   fput((struct file *)file);
> > > > +}
> > >
> > > Can this cause a deadlock if the destructor is invoked from NMI context?
> > >
> > > If an NMI-context BPF program (such as a perf event tracing program)
> > > updates or deletes a map element containing a struct file kptr, the BPF
> > > framework will synchronously invoke this destructor to clean up the old
> > > kptr.
> > >
> > > When fput() drops the reference count to zero, it calls
> > > __fput_deferred(). Since in_interrupt() evaluates to true in NMI
> > > context, __fput_deferred() bypasses task_work_add() and instead calls
> > > schedule_delayed_work().
> > >
> > > This leads to the following call chain:
> > >
> > > bpf_file_release_dtor()
> > >    fput()
> > >       __fput_deferred()
> > >          schedule_delayed_work()
> > >             add_timer()
> > >
> > > The add_timer() function attempts to acquire the timer_base->lock raw
> > > spinlock. If the NMI interrupted a thread that was already holding this
> > > lock, the NMI handler will spin indefinitely waiting for it, resulting
> > > in a hard deadlock.
> > >
> > > Would it be safer to defer the fput() operation using an explicitly
> > > NMI-safe mechanism like irq_work_queue() to handle arbitrary BPF
> > > contexts?
> > >
> > For (human) reviewer context: The NMI stuff needs to be addressed separately. [1]
> >
> > I guess the AI isn't gonna understand but just doing irq_work wouldn't
> > be right either as it would break operation ordering for maps and
> > fixing it here wouldn't fix the other dtors broken in NMI.
> > (cgroup/task_struct)
> 
> Will break operation ordering?
> What do you mean?
>
Basically if we just did irq_work_queue for one map op and then we do it
again for the same irq_work it will just fail because the queue has to
be drained before we can schedule more work.

So ordering can't be:

1. irq_work_queue
2. irq_work_queue (again on same queue)
3. hard interrupt ends

It has to be

1. irq_work_queue (everything)
2. hard interrupt ends.

In the first ordering above, step 2 fails because the queue has a
pending item.
> I feel we should fix things first before being subject
> to more of these bugs.
> Why cannot we defer to irq_work the whole map element if in_nmi
> and call all dtors there?
irq_work won't work if there's pending work already. It will just return
false and not allow you to queue anything until the task is over with.

So it would work for one element, but we'd be stuck and need another
free irq_work to free the next element.

So we'd need an irq work, one for every element that we free.

So either: 

1. allocating memory for a new irq_work on the fly, which is an
operation that can fail. So if we're under memory pressure the
element never gets freed.

2. preallocate an irq work for every element ahead of time.

...

I think the solution for this might be to just reject any kfunc or dtor
that isn't explicitly marked NMI safe from any programs that may 
run in NMI.

With a kfunc flag that marks it as being nmi-safe. KF_NMI or
similar. And extending the concept of kfunc flags to dtors to handle
them in the verifier as well.

My reasoning is:

1. it's very easy to inadvertently introduce new kfuncs/dtors that would
break under NMI. (there's two examples in upstream, task_struct and
cgroup dtors).

2. There's very few NMI contexts reachable from BPF. I only found three
cases, tracepoints for irq_handler_entry irq_handler_exit and
nmi_handler.

3. If the user really needs to no nmi unsafe stuff, why can't we just have them
call bpf_task_work_schedule_resume_impl which is designated for this
purpose?

Justin
> should be a simple fix.

  reply	other threads:[~2026-04-24 23:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:22 [PATCH bpf-next v3 0/2] Allow storing referenced struct file kptrs in BPF maps Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 1/2] bpf: Implement dtor for struct file BTF ID Justin Suess
2026-04-24 19:52   ` bot+bpf-ci
2026-04-24 19:59   ` sashiko-bot
2026-04-24 20:12     ` Justin Suess
2026-04-24 22:05       ` Alexei Starovoitov
2026-04-24 23:20         ` Justin Suess [this message]
2026-04-25  1:25           ` Alexei Starovoitov
2026-04-25  2:17             ` Justin Suess
2026-04-24 19:22 ` [PATCH bpf-next v3 2/2] selftests/bpf: Add test for map-stored struct file kptrs Justin Suess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aev6zU2vj6wmaRMG@zenbox \
    --to=utilityemal77@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox