All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Maguire <alan.maguire@oracle.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Martin KaFai Lau <martin.lau@linux.dev>
Cc: Amery Hung <ameryhung@gmail.com>,
	Gregory Bell <grbell@redhat.com>,
	bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, eddyz87@gmail.com, song@kernel.org,
	yonghong.song@linux.dev, john.fastabend@gmail.com,
	kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com,
	jolsa@kernel.org
Subject: Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
Date: Wed, 11 Mar 2026 12:21:59 +0000	[thread overview]
Message-ID: <c04a0697-393d-41ec-ba2b-6b90f66097ae@oracle.com> (raw)
In-Reply-To: <CAP01T741UV0TsvxxXRAT-zYjKoHwcpsr2bbMS-eFnxytcng7Kg@mail.gmail.com>

On 11/03/2026 09:30, Kumar Kartikeya Dwivedi wrote:
> On Wed, 11 Mar 2026 at 01:20, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>>
>> On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
>>>>
>>>> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>>>>>
>>>>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
>>>>>>
>>>>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
>>>>>> <memxor@gmail.com> wrote:
>>>>>>>
>>>>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
>>>>>>>>
>>>>>>>> When running BPF struct_ops tests consecutively, the second test fails
>>>>>>>> because BTF object references take longer to be released while their
>>>>>>>> associated module is removed from btf_modules list immediately
>>>>>>>> during module unload. This creates a race condition where the second
>>>>>>>> test retrieves the cached BTF object but cannot find valid module
>>>>>>>> information, causing btf_try_get_module() to return NULL.
>>>>>>>>
>>>>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
>>>>>>>> active BTF references to be released before completing module
>>>>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
>>>>>>>> cache before their module metadata is removed, eliminating the timing
>>>>>>>> window that caused the race condition.
>>>>>>>>
>>>>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
>>>>>>>> ---
>>>>>>>>   kernel/bpf/btf.c | 7 ++++++-
>>>>>>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>>>>> index 09fcbb125155..81c9f46a7bb7 100644
>>>>>>>> --- a/kernel/bpf/btf.c
>>>>>>>> +++ b/kernel/bpf/btf.c
>>>>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>>>>>>>>                  list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
>>>>>>>>                          if (btf_mod->module != module)
>>>>>>>>                                  continue;
>>>>>>>> -
>>>>>>>> +                       unsigned int timeout = 1000;
>>>>>>>> +                       while (refcount_read(&btf_mod->btf->refcnt) > 1
>>>>>>>> +                       && timeout > 0 ) {
>>>>>>>> +                               msleep(1);
>>>>>>>> +                               timeout--;
>>>>>>>> +                       }
>>>>>>>
>>>>>>> This isn't the right way to address such issues. Your diagnosis is
>>>>>>> correct but fix is just papering over real problems.
>>>>>>
>>>>>> Second that the fix is not right.
>>
>> pw-bot: cr
>>
>>>>>>
>>>>>>>
>>>>>>> The better way would be to block the discovery of such BTFs once the
>>>>>>> module is unloaded.
>>>>>>> I.e. once we return to userspace after module unload, subsequent calls
>>>>>>> to obtain an fd for such BTFs should not succeed.
>>>>>>> If someone is racing to look it up while the unload happens, so be it,
>>>>>>> they will fail anyway.
>>>>>>>
>>>>>>> I would use a bool flag in struct btf, set it inside the critical
>>>>>>> section at this point with WRITE_ONCE, then test it early in
>>>>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
>>>>>>> condition.
>>>>>>> Using READ_ONCE of course.
>>>>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
>>>>>>> from stale IDs to FDs don't succeed.
>>>>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
>>>>>>> id lookup succeeds, but fd lookup fails.
>>>>>>> So for stale ids it should just skip over their errors if we return
>>>>>>> the same ENOENT as the refcount_inc_not_zero case.
>>>>>>> We will still report stale ids, but not allow their conversion.
>>>>>>> We could filter id lookup too but it will penalize the common code and
>>>>>>> we need extra radix tree lookup just for BTF case.
>>>>>>>
>>>>>>> For me locally, these changes seem to address the problem. I could
>>>>>>> send the fix, but if you would like to take a stab at the above
>>>>>>> change, please go ahead.
>>>>>>>
>>>>>>
>>>>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
>>>>>> likely to return to stale id. However, there is still a small window
>>>>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
>>>>>> creation can still fail.
>>>>>
>>>>> That's fine, it means that module unload is happening in parallel, so
>>>>> it's racy and non-deterministic anyway.
>>>>> When the user controls the module unload though, they should not be
>>>>> getting stale BTF fds once module unload has returned to user space.
>>>>> At that point all associated artifacts shouldn't be discoverable.
>>>>>
>>>>>>
>>>>>> As for the solution, there already is btf_mod->flags. So maybe instead
>>>>>> of adding a bool, just test the flag?
>>>>>
>>>>> You mean struct btf instead of btf_mod? I don't see the flags field there.
>>>>> The btf_mod->flags won't work, it's already deleted and freed by this
>>>>> time, and requires us to go first from the BTF to the btf_mod.
>>>>> So it has to be in struct btf.
>>>>>
>>>>
>>>> I was thinking about updating btf_module->flags when handling
>>>> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
>>>> btf_try_get_module() to check flags if it's module BTF. If the module
>>>> is already freed then it returns -ENOENT.
>>>
>>> Hmm, interesting thought.
>>> I guess the only downside is that it's O(n) lookup under mutex, which
>>> can be costly with lots of modules, and always penalizes
>>> get_fd_by_id() for handling race.
>>> We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
>>> where we may repeat it multiple times in the loop, so the cost is
>>> amplified.
>>>
>>
>> Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
>> directly removing btf->id from btf_idr in btf_module_notify() should
>> stop the discovery.
> 
> I looked at this, I think we will need to reset the btf->id too.
> Otherwise while btf_free_id is idempotent (no-op on free id), the
> released id can be reused and if we don't guard btf_free_id in btf_put
> it will remove it for some other BTF initialized in the meantime.
> We can likely set a flag or something to skip it in btf_put(). As long
> as the module is holding a reference nothing else should do it.
> 

I wonder if it'd make sense to reduce the cost of btf -> module lookup so that
we could check the flag in contexts like btf_get_fd_by_id()? AI suggested reworking
btf_module lookup from a linked list to a pair of xarrays keyed by struct module *
and struct btf *, and it seems reasonable (testing it now). An easy win might be 
to get rid of the strcmp() to vmlinux in btf_is_module(), replacing it with the 
simpler predicate 

	btf->kernel_btf && btf->base_btf;

to reduce overhead of lots of btf_is_module() lookups. What do you think?

  parent reply	other threads:[~2026-03-11 12:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 20:21 [PATCH bpf-next 0/1] Fix BPF struct_ops BTF cleanup race condition Gregory Bell
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
2026-03-10 20:58   ` bot+bpf-ci
2026-03-10 21:39   ` Kumar Kartikeya Dwivedi
2026-03-10 22:38     ` Amery Hung
2026-03-10 22:47       ` Kumar Kartikeya Dwivedi
2026-03-10 23:17         ` Amery Hung
2026-03-10 23:35           ` Kumar Kartikeya Dwivedi
2026-03-11  0:20             ` Martin KaFai Lau
2026-03-11  9:30               ` Kumar Kartikeya Dwivedi
2026-03-11  9:34                 ` Kumar Kartikeya Dwivedi
2026-03-11 12:21                 ` Alan Maguire [this message]
2026-03-11 13:05                   ` Kumar Kartikeya Dwivedi
2026-03-11 13:03               ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
2026-03-11 13:26                 ` Kumar Kartikeya Dwivedi
2026-03-11 16:48                   ` Greg Bell
2026-03-11 13:40                 ` bot+bpf-ci
2026-03-11 19:10                 ` Martin KaFai Lau
2026-03-11 19:17                   ` Andrii Nakryiko
2026-03-11 19:55                     ` Martin KaFai Lau
2026-03-12  3:08                 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c04a0697-393d-41ec-ba2b-6b90f66097ae@oracle.com \
    --to=alan.maguire@oracle.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=grbell@redhat.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.