public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Amery Hung <ameryhung@gmail.com>
Cc: Gregory Bell <grbell@redhat.com>,
	bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, eddyz87@gmail.com, song@kernel.org,
	yonghong.song@linux.dev, john.fastabend@gmail.com,
	kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
	jolsa@kernel.org
Subject: Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
Date: Tue, 10 Mar 2026 17:20:32 -0700	[thread overview]
Message-ID: <aed1731e-986e-484e-bc16-682b025d26cf@linux.dev> (raw)
In-Reply-To: <CAP01T75dDL5pud_NmCmEapw6p1cZss9LTfh6sBxdxYinVQTP5Q@mail.gmail.com>


On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
>>
>> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
>> <memxor@gmail.com> wrote:
>>>
>>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
>>>>
>>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>>>>>
>>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
>>>>>>
>>>>>> When running BPF struct_ops tests consecutively, the second test fails
>>>>>> because BTF object references take longer to be released while their
>>>>>> associated module is removed from btf_modules list immediately
>>>>>> during module unload. This creates a race condition where the second
>>>>>> test retrieves the cached BTF object but cannot find valid module
>>>>>> information, causing btf_try_get_module() to return NULL.
>>>>>>
>>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
>>>>>> active BTF references to be released before completing module
>>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
>>>>>> cache before their module metadata is removed, eliminating the timing
>>>>>> window that caused the race condition.
>>>>>>
>>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
>>>>>> ---
>>>>>>   kernel/bpf/btf.c | 7 ++++++-
>>>>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>>> index 09fcbb125155..81c9f46a7bb7 100644
>>>>>> --- a/kernel/bpf/btf.c
>>>>>> +++ b/kernel/bpf/btf.c
>>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>>>>>>                  list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
>>>>>>                          if (btf_mod->module != module)
>>>>>>                                  continue;
>>>>>> -
>>>>>> +                       unsigned int timeout = 1000;
>>>>>> +                       while (refcount_read(&btf_mod->btf->refcnt) > 1
>>>>>> +                       && timeout > 0 ) {
>>>>>> +                               msleep(1);
>>>>>> +                               timeout--;
>>>>>> +                       }
>>>>>
>>>>> This isn't the right way to address such issues. Your diagnosis is
>>>>> correct but fix is just papering over real problems.
>>>>
>>>> Second that the fix is not right.

pw-bot: cr

>>>>
>>>>>
>>>>> The better way would be to block the discovery of such BTFs once the
>>>>> module is unloaded.
>>>>> I.e. once we return to userspace after module unload, subsequent calls
>>>>> to obtain an fd for such BTFs should not succeed.
>>>>> If someone is racing to look it up while the unload happens, so be it,
>>>>> they will fail anyway.
>>>>>
>>>>> I would use a bool flag in struct btf, set it inside the critical
>>>>> section at this point with WRITE_ONCE, then test it early in
>>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
>>>>> condition.
>>>>> Using READ_ONCE of course.
>>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
>>>>> from stale IDs to FDs don't succeed.
>>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
>>>>> id lookup succeeds, but fd lookup fails.
>>>>> So for stale ids it should just skip over their errors if we return
>>>>> the same ENOENT as the refcount_inc_not_zero case.
>>>>> We will still report stale ids, but not allow their conversion.
>>>>> We could filter id lookup too but it will penalize the common code and
>>>>> we need extra radix tree lookup just for BTF case.
>>>>>
>>>>> For me locally, these changes seem to address the problem. I could
>>>>> send the fix, but if you would like to take a stab at the above
>>>>> change, please go ahead.
>>>>>
>>>>
>>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
>>>> likely to return to stale id. However, there is still a small window
>>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
>>>> creation can still fail.
>>>
>>> That's fine, it means that module unload is happening in parallel, so
>>> it's racy and non-deterministic anyway.
>>> When the user controls the module unload though, they should not be
>>> getting stale BTF fds once module unload has returned to user space.
>>> At that point all associated artifacts shouldn't be discoverable.
>>>
>>>>
>>>> As for the solution, there already is btf_mod->flags. So maybe instead
>>>> of adding a bool, just test the flag?
>>>
>>> You mean struct btf instead of btf_mod? I don't see the flags field there.
>>> The btf_mod->flags won't work, it's already deleted and freed by this
>>> time, and requires us to go first from the BTF to the btf_mod.
>>> So it has to be in struct btf.
>>>
>>
>> I was thinking about updating btf_module->flags when handling
>> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
>> btf_try_get_module() to check flags if it's module BTF. If the module
>> is already freed then it returns -ENOENT.
> 
> Hmm, interesting thought.
> I guess the only downside is that it's O(n) lookup under mutex, which
> can be costly with lots of modules, and always penalizes
> get_fd_by_id() for handling race.
> We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
> where we may repeat it multiple times in the loop, so the cost is
> amplified.
> 

Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise, 
directly removing btf->id from btf_idr in btf_module_notify() should 
stop the discovery.

  reply	other threads:[~2026-03-11  0:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 20:21 [PATCH bpf-next 0/1] Fix BPF struct_ops BTF cleanup race condition Gregory Bell
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
2026-03-10 20:58   ` bot+bpf-ci
2026-03-10 21:39   ` Kumar Kartikeya Dwivedi
2026-03-10 22:38     ` Amery Hung
2026-03-10 22:47       ` Kumar Kartikeya Dwivedi
2026-03-10 23:17         ` Amery Hung
2026-03-10 23:35           ` Kumar Kartikeya Dwivedi
2026-03-11  0:20             ` Martin KaFai Lau [this message]
2026-03-11  9:30               ` Kumar Kartikeya Dwivedi
2026-03-11  9:34                 ` Kumar Kartikeya Dwivedi
2026-03-11 12:21                 ` Alan Maguire
2026-03-11 13:05                   ` Kumar Kartikeya Dwivedi
2026-03-11 13:03               ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
2026-03-11 13:26                 ` Kumar Kartikeya Dwivedi
2026-03-11 16:48                   ` Greg Bell
2026-03-11 13:40                 ` bot+bpf-ci
2026-03-11 19:10                 ` Martin KaFai Lau
2026-03-11 19:17                   ` Andrii Nakryiko
2026-03-11 19:55                     ` Martin KaFai Lau
2026-03-12  3:08                 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aed1731e-986e-484e-bc16-682b025d26cf@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=grbell@redhat.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox