* [PATCH bpf-next 0/1] Fix BPF struct_ops BTF cleanup race condition
@ 2026-03-10 20:21 Gregory Bell
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
0 siblings, 1 reply; 21+ messages in thread
From: Gregory Bell @ 2026-03-10 20:21 UTC (permalink / raw)
To: bpf
Cc: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, Gregory Bell
This patch fixes a race condition in BPF struct_ops where consecutive
test runs fail due to a timing mismatch between BTF object cleanup and
BTF module removal.
When running BPF struct_ops tests consecutively, the second test fails
because BTF object references take longer to be released while their
associated module is removed from btf_modules list immediately
during module unload. This creates a race condition where the second
test retrieves the cached BTF object but cannot find valid module
information, causing btf_try_get_module() to return NULL.
The fix synchronizes these cleanup operations by waiting for active BTF
references to be released before completing module cleanup, eliminating
the race condition.
This issue is reproducible with any of the following tests: global_map_resize,
pro_epilogue, struct_ops_module and struct_ops_multi
./test_progs -t global_map_resize; ./test_progs -t global_map_resize
#134/1 global_map_resize/global_map_resize_bss:OK
#134/2 global_map_resize/global_map_resize_data:OK
#134/3 global_map_resize/global_map_resize_invalid:OK
#134 global_map_resize:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
global_map_resize_bss_subtest:PASS:test_global_map_resize__open 0 nsec
libbpf: map 'test_glo.bss': cannot be resized, last var must be an array
libbpf: map 'test_glo.bss': failed to adjust resized BTF, clearing BTF key/value info: -EINVAL
global_map_resize_bss_subtest:PASS:bpf_map__set_value_size 0 nsec
global_map_resize_bss_subtest:PASS:resize 0 nsec
global_map_resize_bss_subtest:PASS:percpu_arr_resize 0 nsec
global_map_resize_bss_subtest:PASS:array_len 0 nsec
global_map_resize_bss_subtest:PASS:bpf_map__initial_value (ptr) 0 nsec
global_map_resize_bss_subtest:PASS:bpf_map__initial_value (size) 0 nsec
libbpf: map 'st_ops_resize': failed to create: -EINVAL
libbpf: failed to load BPF skeleton 'test_global_map_resize': -EINVAL
global_map_resize_bss_subtest:FAIL:test_global_map_resize__load unexpected error: -22 (errno 22)
#134/1 global_map_resize/global_map_resize_bss:FAIL
global_map_resize_data_subtest:PASS:test_global_map_resize__open 0 nsec
global_map_resize_data_subtest:PASS:bpf_map__set_value_size 0 nsec
global_map_resize_data_subtest:PASS:resize 0 nsec
global_map_resize_data_subtest:PASS:percpu_arr_resize 0 nsec
global_map_resize_data_subtest:PASS:array_len 0 nsec
global_map_resize_data_subtest:PASS:bpf_map__initial_value (ptr) 0 nsec
global_map_resize_data_subtest:PASS:bpf_map__initial_value (size) 0 nsec
libbpf: map 'st_ops_resize': failed to create: -EINVAL
libbpf: failed to load BPF skeleton 'test_global_map_resize': -EINVAL
global_map_resize_data_subtest:FAIL:test_global_map_resize__load unexpected error: -22 (errno 22)
#134/2 global_map_resize/global_map_resize_data:FAIL
#134/3 global_map_resize/global_map_resize_invalid:OK
#134 global_map_resize:FAIL
After the fix, consecutive runs work reliably without requiring delays.
Gregory Bell (1):
bpf: Fix BTF module cleanup race condition in struct_ops
kernel/bpf/btf.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--
2.52.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 20:21 [PATCH bpf-next 0/1] Fix BPF struct_ops BTF cleanup race condition Gregory Bell
@ 2026-03-10 20:21 ` Gregory Bell
2026-03-10 20:58 ` bot+bpf-ci
2026-03-10 21:39 ` Kumar Kartikeya Dwivedi
0 siblings, 2 replies; 21+ messages in thread
From: Gregory Bell @ 2026-03-10 20:21 UTC (permalink / raw)
To: bpf
Cc: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, Gregory Bell
When running BPF struct_ops tests consecutively, the second test fails
because BTF object references take longer to be released while their
associated module is removed from btf_modules list immediately
during module unload. This creates a race condition where the second
test retrieves the cached BTF object but cannot find valid module
information, causing btf_try_get_module() to return NULL.
Fix this by modifying MODULE_STATE_GOING handling to wait for
active BTF references to be released before completing module
cleanup. This ensures BTF objects are properly cleaned up from the
cache before their module metadata is removed, eliminating the timing
window that caused the race condition.
Signed-off-by: Gregory Bell <grbell@redhat.com>
---
kernel/bpf/btf.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 09fcbb125155..81c9f46a7bb7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
if (btf_mod->module != module)
continue;
-
+ unsigned int timeout = 1000;
+ while (refcount_read(&btf_mod->btf->refcnt) > 1
+ && timeout > 0 ) {
+ msleep(1);
+ timeout--;
+ }
list_del(&btf_mod->list);
if (btf_mod->sysfs_attr)
sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
--
2.52.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
@ 2026-03-10 20:58 ` bot+bpf-ci
2026-03-10 21:39 ` Kumar Kartikeya Dwivedi
1 sibling, 0 replies; 21+ messages in thread
From: bot+bpf-ci @ 2026-03-10 20:58 UTC (permalink / raw)
To: grbell, bpf
Cc: martin.lau, ast, daniel, andrii, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, grbell, ast, andrii,
daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2065 bytes --]
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 09fcbb125155..81c9f46a7bb7 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> if (btf_mod->module != module)
> continue;
> + unsigned int timeout = 1000;
> + while (refcount_read(&btf_mod->btf->refcnt) > 1
> + && timeout > 0 ) {
> + msleep(1);
> + timeout--;
> + }
> list_del(&btf_mod->list);
This while loop calls msleep() while holding btf_module_mutex, which
was acquired at the top of the MODULE_STATE_GOING case. This can hold
the mutex for up to 1 second.
During that time, all concurrent callers of btf_try_get_module(),
btf_get_module_btf(), and any other module state notifications that
need btf_module_mutex will block. This includes the BPF verifier path
through __find_kfunc_desc_btf() and check_struct_ops_btf_id().
Additionally, module notifier callbacks run under the module
subsystem's own locks. Sleeping here for up to a second delays all
module load and unload operations system-wide.
Could the mutex be dropped before the wait, or could a different
synchronization mechanism be used here, such as a completion?
Also, if the timeout expires with the refcount still above 1, the
code falls through to list_del() and cleanup anyway. The commit
message states:
> This ensures BTF objects are properly cleaned up from the
> cache before their module metadata is removed, eliminating the timing
> window that caused the race condition.
This does not appear to be accurate, since the timeout makes the fix
probabilistic rather than a guarantee. If a BTF reference is held for
longer than 1 second, the original race condition would still occur.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22923069980
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
2026-03-10 20:58 ` bot+bpf-ci
@ 2026-03-10 21:39 ` Kumar Kartikeya Dwivedi
2026-03-10 22:38 ` Amery Hung
1 sibling, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-10 21:39 UTC (permalink / raw)
To: Gregory Bell
Cc: bpf, martin.lau, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
>
> When running BPF struct_ops tests consecutively, the second test fails
> because BTF object references take longer to be released while their
> associated module is removed from btf_modules list immediately
> during module unload. This creates a race condition where the second
> test retrieves the cached BTF object but cannot find valid module
> information, causing btf_try_get_module() to return NULL.
>
> Fix this by modifying MODULE_STATE_GOING handling to wait for
> active BTF references to be released before completing module
> cleanup. This ensures BTF objects are properly cleaned up from the
> cache before their module metadata is removed, eliminating the timing
> window that caused the race condition.
>
> Signed-off-by: Gregory Bell <grbell@redhat.com>
> ---
> kernel/bpf/btf.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 09fcbb125155..81c9f46a7bb7 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> if (btf_mod->module != module)
> continue;
> -
> + unsigned int timeout = 1000;
> + while (refcount_read(&btf_mod->btf->refcnt) > 1
> + && timeout > 0 ) {
> + msleep(1);
> + timeout--;
> + }
This isn't the right way to address such issues. Your diagnosis is
correct but fix is just papering over real problems.
The better way would be to block the discovery of such BTFs once the
module is unloaded.
I.e. once we return to userspace after module unload, subsequent calls
to obtain an fd for such BTFs should not succeed.
If someone is racing to look it up while the unload happens, so be it,
they will fail anyway.
I would use a bool flag in struct btf, set it inside the critical
section at this point with WRITE_ONCE, then test it early in
bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
condition.
Using READ_ONCE of course.
Also, we need to add it to btf_get_fd_by_id, so that once conversions
from stale IDs to FDs don't succeed.
Looking at load_module_btfs(), it already handles the ENOENT case when
id lookup succeeds, but fd lookup fails.
So for stale ids it should just skip over their errors if we return
the same ENOENT as the refcount_inc_not_zero case.
We will still report stale ids, but not allow their conversion.
We could filter id lookup too but it will penalize the common code and
we need extra radix tree lookup just for BTF case.
For me locally, these changes seem to address the problem. I could
send the fix, but if you would like to take a stab at the above
change, please go ahead.
> list_del(&btf_mod->list);
> if (btf_mod->sysfs_attr)
> sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 21:39 ` Kumar Kartikeya Dwivedi
@ 2026-03-10 22:38 ` Amery Hung
2026-03-10 22:47 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 21+ messages in thread
From: Amery Hung @ 2026-03-10 22:38 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: Gregory Bell, bpf, martin.lau, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> >
> > When running BPF struct_ops tests consecutively, the second test fails
> > because BTF object references take longer to be released while their
> > associated module is removed from btf_modules list immediately
> > during module unload. This creates a race condition where the second
> > test retrieves the cached BTF object but cannot find valid module
> > information, causing btf_try_get_module() to return NULL.
> >
> > Fix this by modifying MODULE_STATE_GOING handling to wait for
> > active BTF references to be released before completing module
> > cleanup. This ensures BTF objects are properly cleaned up from the
> > cache before their module metadata is removed, eliminating the timing
> > window that caused the race condition.
> >
> > Signed-off-by: Gregory Bell <grbell@redhat.com>
> > ---
> > kernel/bpf/btf.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 09fcbb125155..81c9f46a7bb7 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > if (btf_mod->module != module)
> > continue;
> > -
> > + unsigned int timeout = 1000;
> > + while (refcount_read(&btf_mod->btf->refcnt) > 1
> > + && timeout > 0 ) {
> > + msleep(1);
> > + timeout--;
> > + }
>
> This isn't the right way to address such issues. Your diagnosis is
> correct but fix is just papering over real problems.
Second that the fix is not right.
>
> The better way would be to block the discovery of such BTFs once the
> module is unloaded.
> I.e. once we return to userspace after module unload, subsequent calls
> to obtain an fd for such BTFs should not succeed.
> If someone is racing to look it up while the unload happens, so be it,
> they will fail anyway.
>
> I would use a bool flag in struct btf, set it inside the critical
> section at this point with WRITE_ONCE, then test it early in
> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> condition.
> Using READ_ONCE of course.
> Also, we need to add it to btf_get_fd_by_id, so that once conversions
> from stale IDs to FDs don't succeed.
> Looking at load_module_btfs(), it already handles the ENOENT case when
> id lookup succeeds, but fd lookup fails.
> So for stale ids it should just skip over their errors if we return
> the same ENOENT as the refcount_inc_not_zero case.
> We will still report stale ids, but not allow their conversion.
> We could filter id lookup too but it will penalize the common code and
> we need extra radix tree lookup just for BTF case.
>
> For me locally, these changes seem to address the problem. I could
> send the fix, but if you would like to take a stab at the above
> change, please go ahead.
>
I guess this is worth a fix because it makes btf_get_fd_by_id() less
likely to return to stale id. However, there is still a small window
between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
creation can still fail.
As for the solution, there already is btf_mod->flags. So maybe instead
of adding a bool, just test the flag?
>
>
> > list_del(&btf_mod->list);
> > if (btf_mod->sysfs_attr)
> > sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> > --
> > 2.52.0
> >
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 22:38 ` Amery Hung
@ 2026-03-10 22:47 ` Kumar Kartikeya Dwivedi
2026-03-10 23:17 ` Amery Hung
0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-10 22:47 UTC (permalink / raw)
To: Amery Hung
Cc: Gregory Bell, bpf, martin.lau, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> > >
> > > When running BPF struct_ops tests consecutively, the second test fails
> > > because BTF object references take longer to be released while their
> > > associated module is removed from btf_modules list immediately
> > > during module unload. This creates a race condition where the second
> > > test retrieves the cached BTF object but cannot find valid module
> > > information, causing btf_try_get_module() to return NULL.
> > >
> > > Fix this by modifying MODULE_STATE_GOING handling to wait for
> > > active BTF references to be released before completing module
> > > cleanup. This ensures BTF objects are properly cleaned up from the
> > > cache before their module metadata is removed, eliminating the timing
> > > window that caused the race condition.
> > >
> > > Signed-off-by: Gregory Bell <grbell@redhat.com>
> > > ---
> > > kernel/bpf/btf.c | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 09fcbb125155..81c9f46a7bb7 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > > list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > > if (btf_mod->module != module)
> > > continue;
> > > -
> > > + unsigned int timeout = 1000;
> > > + while (refcount_read(&btf_mod->btf->refcnt) > 1
> > > + && timeout > 0 ) {
> > > + msleep(1);
> > > + timeout--;
> > > + }
> >
> > This isn't the right way to address such issues. Your diagnosis is
> > correct but fix is just papering over real problems.
>
> Second that the fix is not right.
>
> >
> > The better way would be to block the discovery of such BTFs once the
> > module is unloaded.
> > I.e. once we return to userspace after module unload, subsequent calls
> > to obtain an fd for such BTFs should not succeed.
> > If someone is racing to look it up while the unload happens, so be it,
> > they will fail anyway.
> >
> > I would use a bool flag in struct btf, set it inside the critical
> > section at this point with WRITE_ONCE, then test it early in
> > bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> > condition.
> > Using READ_ONCE of course.
> > Also, we need to add it to btf_get_fd_by_id, so that once conversions
> > from stale IDs to FDs don't succeed.
> > Looking at load_module_btfs(), it already handles the ENOENT case when
> > id lookup succeeds, but fd lookup fails.
> > So for stale ids it should just skip over their errors if we return
> > the same ENOENT as the refcount_inc_not_zero case.
> > We will still report stale ids, but not allow their conversion.
> > We could filter id lookup too but it will penalize the common code and
> > we need extra radix tree lookup just for BTF case.
> >
> > For me locally, these changes seem to address the problem. I could
> > send the fix, but if you would like to take a stab at the above
> > change, please go ahead.
> >
>
> I guess this is worth a fix because it makes btf_get_fd_by_id() less
> likely to return to stale id. However, there is still a small window
> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> creation can still fail.
That's fine, it means that module unload is happening in parallel, so
it's racy and non-deterministic anyway.
When the user controls the module unload though, they should not be
getting stale BTF fds once module unload has returned to user space.
At that point all associated artifacts shouldn't be discoverable.
>
> As for the solution, there already is btf_mod->flags. So maybe instead
> of adding a bool, just test the flag?
You mean struct btf instead of btf_mod? I don't see the flags field there.
The btf_mod->flags won't work, it's already deleted and freed by this
time, and requires us to go first from the BTF to the btf_mod.
So it has to be in struct btf.
>
> >
> >
> > > list_del(&btf_mod->list);
> > > if (btf_mod->sysfs_attr)
> > > sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> > > --
> > > 2.52.0
> > >
> > >
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 22:47 ` Kumar Kartikeya Dwivedi
@ 2026-03-10 23:17 ` Amery Hung
2026-03-10 23:35 ` Kumar Kartikeya Dwivedi
0 siblings, 1 reply; 21+ messages in thread
From: Amery Hung @ 2026-03-10 23:17 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: Gregory Bell, bpf, martin.lau, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> > > >
> > > > When running BPF struct_ops tests consecutively, the second test fails
> > > > because BTF object references take longer to be released while their
> > > > associated module is removed from btf_modules list immediately
> > > > during module unload. This creates a race condition where the second
> > > > test retrieves the cached BTF object but cannot find valid module
> > > > information, causing btf_try_get_module() to return NULL.
> > > >
> > > > Fix this by modifying MODULE_STATE_GOING handling to wait for
> > > > active BTF references to be released before completing module
> > > > cleanup. This ensures BTF objects are properly cleaned up from the
> > > > cache before their module metadata is removed, eliminating the timing
> > > > window that caused the race condition.
> > > >
> > > > Signed-off-by: Gregory Bell <grbell@redhat.com>
> > > > ---
> > > > kernel/bpf/btf.c | 7 ++++++-
> > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index 09fcbb125155..81c9f46a7bb7 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > > > list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > > > if (btf_mod->module != module)
> > > > continue;
> > > > -
> > > > + unsigned int timeout = 1000;
> > > > + while (refcount_read(&btf_mod->btf->refcnt) > 1
> > > > + && timeout > 0 ) {
> > > > + msleep(1);
> > > > + timeout--;
> > > > + }
> > >
> > > This isn't the right way to address such issues. Your diagnosis is
> > > correct but fix is just papering over real problems.
> >
> > Second that the fix is not right.
> >
> > >
> > > The better way would be to block the discovery of such BTFs once the
> > > module is unloaded.
> > > I.e. once we return to userspace after module unload, subsequent calls
> > > to obtain an fd for such BTFs should not succeed.
> > > If someone is racing to look it up while the unload happens, so be it,
> > > they will fail anyway.
> > >
> > > I would use a bool flag in struct btf, set it inside the critical
> > > section at this point with WRITE_ONCE, then test it early in
> > > bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> > > condition.
> > > Using READ_ONCE of course.
> > > Also, we need to add it to btf_get_fd_by_id, so that once conversions
> > > from stale IDs to FDs don't succeed.
> > > Looking at load_module_btfs(), it already handles the ENOENT case when
> > > id lookup succeeds, but fd lookup fails.
> > > So for stale ids it should just skip over their errors if we return
> > > the same ENOENT as the refcount_inc_not_zero case.
> > > We will still report stale ids, but not allow their conversion.
> > > We could filter id lookup too but it will penalize the common code and
> > > we need extra radix tree lookup just for BTF case.
> > >
> > > For me locally, these changes seem to address the problem. I could
> > > send the fix, but if you would like to take a stab at the above
> > > change, please go ahead.
> > >
> >
> > I guess this is worth a fix because it makes btf_get_fd_by_id() less
> > likely to return to stale id. However, there is still a small window
> > between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> > creation can still fail.
>
> That's fine, it means that module unload is happening in parallel, so
> it's racy and non-deterministic anyway.
> When the user controls the module unload though, they should not be
> getting stale BTF fds once module unload has returned to user space.
> At that point all associated artifacts shouldn't be discoverable.
>
> >
> > As for the solution, there already is btf_mod->flags. So maybe instead
> > of adding a bool, just test the flag?
>
> You mean struct btf instead of btf_mod? I don't see the flags field there.
> The btf_mod->flags won't work, it's already deleted and freed by this
> time, and requires us to go first from the BTF to the btf_mod.
> So it has to be in struct btf.
>
I was thinking about updating btf_module->flags when handling
MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
btf_try_get_module() to check flags if it's module BTF. If the module
is already freed then it returns -ENOENT.
>
> >
> > >
> > >
> > > > list_del(&btf_mod->list);
> > > > if (btf_mod->sysfs_attr)
> > > > sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> > > > --
> > > > 2.52.0
> > > >
> > > >
> > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 23:17 ` Amery Hung
@ 2026-03-10 23:35 ` Kumar Kartikeya Dwivedi
2026-03-11 0:20 ` Martin KaFai Lau
0 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-10 23:35 UTC (permalink / raw)
To: Amery Hung
Cc: Gregory Bell, bpf, martin.lau, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> > > > >
> > > > > When running BPF struct_ops tests consecutively, the second test fails
> > > > > because BTF object references take longer to be released while their
> > > > > associated module is removed from btf_modules list immediately
> > > > > during module unload. This creates a race condition where the second
> > > > > test retrieves the cached BTF object but cannot find valid module
> > > > > information, causing btf_try_get_module() to return NULL.
> > > > >
> > > > > Fix this by modifying MODULE_STATE_GOING handling to wait for
> > > > > active BTF references to be released before completing module
> > > > > cleanup. This ensures BTF objects are properly cleaned up from the
> > > > > cache before their module metadata is removed, eliminating the timing
> > > > > window that caused the race condition.
> > > > >
> > > > > Signed-off-by: Gregory Bell <grbell@redhat.com>
> > > > > ---
> > > > > kernel/bpf/btf.c | 7 ++++++-
> > > > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > > index 09fcbb125155..81c9f46a7bb7 100644
> > > > > --- a/kernel/bpf/btf.c
> > > > > +++ b/kernel/bpf/btf.c
> > > > > @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > > > > list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > > > > if (btf_mod->module != module)
> > > > > continue;
> > > > > -
> > > > > + unsigned int timeout = 1000;
> > > > > + while (refcount_read(&btf_mod->btf->refcnt) > 1
> > > > > + && timeout > 0 ) {
> > > > > + msleep(1);
> > > > > + timeout--;
> > > > > + }
> > > >
> > > > This isn't the right way to address such issues. Your diagnosis is
> > > > correct but fix is just papering over real problems.
> > >
> > > Second that the fix is not right.
> > >
> > > >
> > > > The better way would be to block the discovery of such BTFs once the
> > > > module is unloaded.
> > > > I.e. once we return to userspace after module unload, subsequent calls
> > > > to obtain an fd for such BTFs should not succeed.
> > > > If someone is racing to look it up while the unload happens, so be it,
> > > > they will fail anyway.
> > > >
> > > > I would use a bool flag in struct btf, set it inside the critical
> > > > section at this point with WRITE_ONCE, then test it early in
> > > > bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> > > > condition.
> > > > Using READ_ONCE of course.
> > > > Also, we need to add it to btf_get_fd_by_id, so that once conversions
> > > > from stale IDs to FDs don't succeed.
> > > > Looking at load_module_btfs(), it already handles the ENOENT case when
> > > > id lookup succeeds, but fd lookup fails.
> > > > So for stale ids it should just skip over their errors if we return
> > > > the same ENOENT as the refcount_inc_not_zero case.
> > > > We will still report stale ids, but not allow their conversion.
> > > > We could filter id lookup too but it will penalize the common code and
> > > > we need extra radix tree lookup just for BTF case.
> > > >
> > > > For me locally, these changes seem to address the problem. I could
> > > > send the fix, but if you would like to take a stab at the above
> > > > change, please go ahead.
> > > >
> > >
> > > I guess this is worth a fix because it makes btf_get_fd_by_id() less
> > > likely to return to stale id. However, there is still a small window
> > > between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> > > creation can still fail.
> >
> > That's fine, it means that module unload is happening in parallel, so
> > it's racy and non-deterministic anyway.
> > When the user controls the module unload though, they should not be
> > getting stale BTF fds once module unload has returned to user space.
> > At that point all associated artifacts shouldn't be discoverable.
> >
> > >
> > > As for the solution, there already is btf_mod->flags. So maybe instead
> > > of adding a bool, just test the flag?
> >
> > You mean struct btf instead of btf_mod? I don't see the flags field there.
> > The btf_mod->flags won't work, it's already deleted and freed by this
> > time, and requires us to go first from the BTF to the btf_mod.
> > So it has to be in struct btf.
> >
>
> I was thinking about updating btf_module->flags when handling
> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
> btf_try_get_module() to check flags if it's module BTF. If the module
> is already freed then it returns -ENOENT.
Hmm, interesting thought.
I guess the only downside is that it's O(n) lookup under mutex, which
can be costly with lots of modules, and always penalizes
get_fd_by_id() for handling race.
We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
where we may repeat it multiple times in the loop, so the cost is
amplified.
Let's see what other maintainers think.
>
> >
> > >
> > > >
> > > >
> > > > > list_del(&btf_mod->list);
> > > > > if (btf_mod->sysfs_attr)
> > > > > sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> > > > > --
> > > > > 2.52.0
> > > > >
> > > > >
> > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-10 23:35 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 0:20 ` Martin KaFai Lau
2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
0 siblings, 2 replies; 21+ messages in thread
From: Martin KaFai Lau @ 2026-03-11 0:20 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi, Amery Hung
Cc: Gregory Bell, bpf, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
>>
>> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
>> <memxor@gmail.com> wrote:
>>>
>>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
>>>>
>>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>>>>>
>>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
>>>>>>
>>>>>> When running BPF struct_ops tests consecutively, the second test fails
>>>>>> because BTF object references take longer to be released while their
>>>>>> associated module is removed from btf_modules list immediately
>>>>>> during module unload. This creates a race condition where the second
>>>>>> test retrieves the cached BTF object but cannot find valid module
>>>>>> information, causing btf_try_get_module() to return NULL.
>>>>>>
>>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
>>>>>> active BTF references to be released before completing module
>>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
>>>>>> cache before their module metadata is removed, eliminating the timing
>>>>>> window that caused the race condition.
>>>>>>
>>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
>>>>>> ---
>>>>>> kernel/bpf/btf.c | 7 ++++++-
>>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>>> index 09fcbb125155..81c9f46a7bb7 100644
>>>>>> --- a/kernel/bpf/btf.c
>>>>>> +++ b/kernel/bpf/btf.c
>>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>>>>>> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
>>>>>> if (btf_mod->module != module)
>>>>>> continue;
>>>>>> -
>>>>>> + unsigned int timeout = 1000;
>>>>>> + while (refcount_read(&btf_mod->btf->refcnt) > 1
>>>>>> + && timeout > 0 ) {
>>>>>> + msleep(1);
>>>>>> + timeout--;
>>>>>> + }
>>>>>
>>>>> This isn't the right way to address such issues. Your diagnosis is
>>>>> correct but fix is just papering over real problems.
>>>>
>>>> Second that the fix is not right.
pw-bot: cr
>>>>
>>>>>
>>>>> The better way would be to block the discovery of such BTFs once the
>>>>> module is unloaded.
>>>>> I.e. once we return to userspace after module unload, subsequent calls
>>>>> to obtain an fd for such BTFs should not succeed.
>>>>> If someone is racing to look it up while the unload happens, so be it,
>>>>> they will fail anyway.
>>>>>
>>>>> I would use a bool flag in struct btf, set it inside the critical
>>>>> section at this point with WRITE_ONCE, then test it early in
>>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
>>>>> condition.
>>>>> Using READ_ONCE of course.
>>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
>>>>> from stale IDs to FDs don't succeed.
>>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
>>>>> id lookup succeeds, but fd lookup fails.
>>>>> So for stale ids it should just skip over their errors if we return
>>>>> the same ENOENT as the refcount_inc_not_zero case.
>>>>> We will still report stale ids, but not allow their conversion.
>>>>> We could filter id lookup too but it will penalize the common code and
>>>>> we need extra radix tree lookup just for BTF case.
>>>>>
>>>>> For me locally, these changes seem to address the problem. I could
>>>>> send the fix, but if you would like to take a stab at the above
>>>>> change, please go ahead.
>>>>>
>>>>
>>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
>>>> likely to return to stale id. However, there is still a small window
>>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
>>>> creation can still fail.
>>>
>>> That's fine, it means that module unload is happening in parallel, so
>>> it's racy and non-deterministic anyway.
>>> When the user controls the module unload though, they should not be
>>> getting stale BTF fds once module unload has returned to user space.
>>> At that point all associated artifacts shouldn't be discoverable.
>>>
>>>>
>>>> As for the solution, there already is btf_mod->flags. So maybe instead
>>>> of adding a bool, just test the flag?
>>>
>>> You mean struct btf instead of btf_mod? I don't see the flags field there.
>>> The btf_mod->flags won't work, it's already deleted and freed by this
>>> time, and requires us to go first from the BTF to the btf_mod.
>>> So it has to be in struct btf.
>>>
>>
>> I was thinking about updating btf_module->flags when handling
>> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
>> btf_try_get_module() to check flags if it's module BTF. If the module
>> is already freed then it returns -ENOENT.
>
> Hmm, interesting thought.
> I guess the only downside is that it's O(n) lookup under mutex, which
> can be costly with lots of modules, and always penalizes
> get_fd_by_id() for handling race.
> We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
> where we may repeat it multiple times in the loop, so the cost is
> amplified.
>
Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
directly removing btf->id from btf_idr in btf_module_notify() should
stop the discovery.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-11 0:20 ` Martin KaFai Lau
@ 2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
2026-03-11 9:34 ` Kumar Kartikeya Dwivedi
2026-03-11 12:21 ` Alan Maguire
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
1 sibling, 2 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-11 9:30 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Amery Hung, Gregory Bell, bpf, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Wed, 11 Mar 2026 at 01:20, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
>
> On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
> >>
> >> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
> >> <memxor@gmail.com> wrote:
> >>>
> >>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
> >>>>
> >>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> >>>> <memxor@gmail.com> wrote:
> >>>>>
> >>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> >>>>>>
> >>>>>> When running BPF struct_ops tests consecutively, the second test fails
> >>>>>> because BTF object references take longer to be released while their
> >>>>>> associated module is removed from btf_modules list immediately
> >>>>>> during module unload. This creates a race condition where the second
> >>>>>> test retrieves the cached BTF object but cannot find valid module
> >>>>>> information, causing btf_try_get_module() to return NULL.
> >>>>>>
> >>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
> >>>>>> active BTF references to be released before completing module
> >>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
> >>>>>> cache before their module metadata is removed, eliminating the timing
> >>>>>> window that caused the race condition.
> >>>>>>
> >>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
> >>>>>> ---
> >>>>>> kernel/bpf/btf.c | 7 ++++++-
> >>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>>>> index 09fcbb125155..81c9f46a7bb7 100644
> >>>>>> --- a/kernel/bpf/btf.c
> >>>>>> +++ b/kernel/bpf/btf.c
> >>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> >>>>>> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> >>>>>> if (btf_mod->module != module)
> >>>>>> continue;
> >>>>>> -
> >>>>>> + unsigned int timeout = 1000;
> >>>>>> + while (refcount_read(&btf_mod->btf->refcnt) > 1
> >>>>>> + && timeout > 0 ) {
> >>>>>> + msleep(1);
> >>>>>> + timeout--;
> >>>>>> + }
> >>>>>
> >>>>> This isn't the right way to address such issues. Your diagnosis is
> >>>>> correct but fix is just papering over real problems.
> >>>>
> >>>> Second that the fix is not right.
>
> pw-bot: cr
>
> >>>>
> >>>>>
> >>>>> The better way would be to block the discovery of such BTFs once the
> >>>>> module is unloaded.
> >>>>> I.e. once we return to userspace after module unload, subsequent calls
> >>>>> to obtain an fd for such BTFs should not succeed.
> >>>>> If someone is racing to look it up while the unload happens, so be it,
> >>>>> they will fail anyway.
> >>>>>
> >>>>> I would use a bool flag in struct btf, set it inside the critical
> >>>>> section at this point with WRITE_ONCE, then test it early in
> >>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> >>>>> condition.
> >>>>> Using READ_ONCE of course.
> >>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
> >>>>> from stale IDs to FDs don't succeed.
> >>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
> >>>>> id lookup succeeds, but fd lookup fails.
> >>>>> So for stale ids it should just skip over their errors if we return
> >>>>> the same ENOENT as the refcount_inc_not_zero case.
> >>>>> We will still report stale ids, but not allow their conversion.
> >>>>> We could filter id lookup too but it will penalize the common code and
> >>>>> we need extra radix tree lookup just for BTF case.
> >>>>>
> >>>>> For me locally, these changes seem to address the problem. I could
> >>>>> send the fix, but if you would like to take a stab at the above
> >>>>> change, please go ahead.
> >>>>>
> >>>>
> >>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
> >>>> likely to return to stale id. However, there is still a small window
> >>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> >>>> creation can still fail.
> >>>
> >>> That's fine, it means that module unload is happening in parallel, so
> >>> it's racy and non-deterministic anyway.
> >>> When the user controls the module unload though, they should not be
> >>> getting stale BTF fds once module unload has returned to user space.
> >>> At that point all associated artifacts shouldn't be discoverable.
> >>>
> >>>>
> >>>> As for the solution, there already is btf_mod->flags. So maybe instead
> >>>> of adding a bool, just test the flag?
> >>>
> >>> You mean struct btf instead of btf_mod? I don't see the flags field there.
> >>> The btf_mod->flags won't work, it's already deleted and freed by this
> >>> time, and requires us to go first from the BTF to the btf_mod.
> >>> So it has to be in struct btf.
> >>>
> >>
> >> I was thinking about updating btf_module->flags when handling
> >> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
> >> btf_try_get_module() to check flags if it's module BTF. If the module
> >> is already freed then it returns -ENOENT.
> >
> > Hmm, interesting thought.
> > I guess the only downside is that it's O(n) lookup under mutex, which
> > can be costly with lots of modules, and always penalizes
> > get_fd_by_id() for handling race.
> > We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
> > where we may repeat it multiple times in the loop, so the cost is
> > amplified.
> >
>
> Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
> directly removing btf->id from btf_idr in btf_module_notify() should
> stop the discovery.
I looked at this, I think we will need to reset the btf->id too.
Otherwise while btf_free_id is idempotent (no-op on free id), the
released id can be reused and if we don't guard btf_free_id in btf_put
it will remove it for some other BTF initialized in the meantime.
We can likely set a flag or something to skip it in btf_put(). As long
as the module is holding a reference nothing else should do it.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 9:34 ` Kumar Kartikeya Dwivedi
2026-03-11 12:21 ` Alan Maguire
1 sibling, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-11 9:34 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Amery Hung, Gregory Bell, bpf, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On Wed, 11 Mar 2026 at 10:30, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, 11 Mar 2026 at 01:20, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> >
> > On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
> > > On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
> > >>
> > >> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
> > >> <memxor@gmail.com> wrote:
> > >>>
> > >>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
> > >>>>
> > >>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> > >>>> <memxor@gmail.com> wrote:
> > >>>>>
> > >>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> > >>>>>>
> > >>>>>> When running BPF struct_ops tests consecutively, the second test fails
> > >>>>>> because BTF object references take longer to be released while their
> > >>>>>> associated module is removed from btf_modules list immediately
> > >>>>>> during module unload. This creates a race condition where the second
> > >>>>>> test retrieves the cached BTF object but cannot find valid module
> > >>>>>> information, causing btf_try_get_module() to return NULL.
> > >>>>>>
> > >>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
> > >>>>>> active BTF references to be released before completing module
> > >>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
> > >>>>>> cache before their module metadata is removed, eliminating the timing
> > >>>>>> window that caused the race condition.
> > >>>>>>
> > >>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
> > >>>>>> ---
> > >>>>>> kernel/bpf/btf.c | 7 ++++++-
> > >>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
> > >>>>>>
> > >>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > >>>>>> index 09fcbb125155..81c9f46a7bb7 100644
> > >>>>>> --- a/kernel/bpf/btf.c
> > >>>>>> +++ b/kernel/bpf/btf.c
> > >>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > >>>>>> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> > >>>>>> if (btf_mod->module != module)
> > >>>>>> continue;
> > >>>>>> -
> > >>>>>> + unsigned int timeout = 1000;
> > >>>>>> + while (refcount_read(&btf_mod->btf->refcnt) > 1
> > >>>>>> + && timeout > 0 ) {
> > >>>>>> + msleep(1);
> > >>>>>> + timeout--;
> > >>>>>> + }
> > >>>>>
> > >>>>> This isn't the right way to address such issues. Your diagnosis is
> > >>>>> correct but fix is just papering over real problems.
> > >>>>
> > >>>> Second that the fix is not right.
> >
> > pw-bot: cr
> >
> > >>>>
> > >>>>>
> > >>>>> The better way would be to block the discovery of such BTFs once the
> > >>>>> module is unloaded.
> > >>>>> I.e. once we return to userspace after module unload, subsequent calls
> > >>>>> to obtain an fd for such BTFs should not succeed.
> > >>>>> If someone is racing to look it up while the unload happens, so be it,
> > >>>>> they will fail anyway.
> > >>>>>
> > >>>>> I would use a bool flag in struct btf, set it inside the critical
> > >>>>> section at this point with WRITE_ONCE, then test it early in
> > >>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> > >>>>> condition.
> > >>>>> Using READ_ONCE of course.
> > >>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
> > >>>>> from stale IDs to FDs don't succeed.
> > >>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
> > >>>>> id lookup succeeds, but fd lookup fails.
> > >>>>> So for stale ids it should just skip over their errors if we return
> > >>>>> the same ENOENT as the refcount_inc_not_zero case.
> > >>>>> We will still report stale ids, but not allow their conversion.
> > >>>>> We could filter id lookup too but it will penalize the common code and
> > >>>>> we need extra radix tree lookup just for BTF case.
> > >>>>>
> > >>>>> For me locally, these changes seem to address the problem. I could
> > >>>>> send the fix, but if you would like to take a stab at the above
> > >>>>> change, please go ahead.
> > >>>>>
> > >>>>
> > >>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
> > >>>> likely to return to stale id. However, there is still a small window
> > >>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> > >>>> creation can still fail.
> > >>>
> > >>> That's fine, it means that module unload is happening in parallel, so
> > >>> it's racy and non-deterministic anyway.
> > >>> When the user controls the module unload though, they should not be
> > >>> getting stale BTF fds once module unload has returned to user space.
> > >>> At that point all associated artifacts shouldn't be discoverable.
> > >>>
> > >>>>
> > >>>> As for the solution, there already is btf_mod->flags. So maybe instead
> > >>>> of adding a bool, just test the flag?
> > >>>
> > >>> You mean struct btf instead of btf_mod? I don't see the flags field there.
> > >>> The btf_mod->flags won't work, it's already deleted and freed by this
> > >>> time, and requires us to go first from the BTF to the btf_mod.
> > >>> So it has to be in struct btf.
> > >>>
> > >>
> > >> I was thinking about updating btf_module->flags when handling
> > >> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
> > >> btf_try_get_module() to check flags if it's module BTF. If the module
> > >> is already freed then it returns -ENOENT.
> > >
> > > Hmm, interesting thought.
> > > I guess the only downside is that it's O(n) lookup under mutex, which
> > > can be costly with lots of modules, and always penalizes
> > > get_fd_by_id() for handling race.
> > > We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
> > > where we may repeat it multiple times in the loop, so the cost is
> > > amplified.
> > >
> >
> > Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
> > directly removing btf->id from btf_idr in btf_module_notify() should
> > stop the discovery.
>
> I looked at this, I think we will need to reset the btf->id too.
> Otherwise while btf_free_id is idempotent (no-op on free id), the
> released id can be reused and if we don't guard btf_free_id in btf_put
> it will remove it for some other BTF initialized in the meantime.
> We can likely set a flag or something to skip it in btf_put(). As long
> as the module is holding a reference nothing else should do it.
Another way to make the check is skip btf_free_id() in btf_put() for
btf_is_module(), and always do it in the MODULE_STATE_GOING case.
The module is always going to release its reference, and precisely at
that point we want the idr to be freed.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
2026-03-11 9:34 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 12:21 ` Alan Maguire
2026-03-11 13:05 ` Kumar Kartikeya Dwivedi
1 sibling, 1 reply; 21+ messages in thread
From: Alan Maguire @ 2026-03-11 12:21 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi, Martin KaFai Lau
Cc: Amery Hung, Gregory Bell, bpf, ast, daniel, andrii, eddyz87, song,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa
On 11/03/2026 09:30, Kumar Kartikeya Dwivedi wrote:
> On Wed, 11 Mar 2026 at 01:20, Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>>
>> On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
>>>>
>>>> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
>>>> <memxor@gmail.com> wrote:
>>>>>
>>>>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
>>>>>>
>>>>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
>>>>>> <memxor@gmail.com> wrote:
>>>>>>>
>>>>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
>>>>>>>>
>>>>>>>> When running BPF struct_ops tests consecutively, the second test fails
>>>>>>>> because BTF object references take longer to be released while their
>>>>>>>> associated module is removed from btf_modules list immediately
>>>>>>>> during module unload. This creates a race condition where the second
>>>>>>>> test retrieves the cached BTF object but cannot find valid module
>>>>>>>> information, causing btf_try_get_module() to return NULL.
>>>>>>>>
>>>>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
>>>>>>>> active BTF references to be released before completing module
>>>>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
>>>>>>>> cache before their module metadata is removed, eliminating the timing
>>>>>>>> window that caused the race condition.
>>>>>>>>
>>>>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
>>>>>>>> ---
>>>>>>>> kernel/bpf/btf.c | 7 ++++++-
>>>>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>>>>> index 09fcbb125155..81c9f46a7bb7 100644
>>>>>>>> --- a/kernel/bpf/btf.c
>>>>>>>> +++ b/kernel/bpf/btf.c
>>>>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>>>>>>>> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
>>>>>>>> if (btf_mod->module != module)
>>>>>>>> continue;
>>>>>>>> -
>>>>>>>> + unsigned int timeout = 1000;
>>>>>>>> + while (refcount_read(&btf_mod->btf->refcnt) > 1
>>>>>>>> + && timeout > 0 ) {
>>>>>>>> + msleep(1);
>>>>>>>> + timeout--;
>>>>>>>> + }
>>>>>>>
>>>>>>> This isn't the right way to address such issues. Your diagnosis is
>>>>>>> correct but fix is just papering over real problems.
>>>>>>
>>>>>> Second that the fix is not right.
>>
>> pw-bot: cr
>>
>>>>>>
>>>>>>>
>>>>>>> The better way would be to block the discovery of such BTFs once the
>>>>>>> module is unloaded.
>>>>>>> I.e. once we return to userspace after module unload, subsequent calls
>>>>>>> to obtain an fd for such BTFs should not succeed.
>>>>>>> If someone is racing to look it up while the unload happens, so be it,
>>>>>>> they will fail anyway.
>>>>>>>
>>>>>>> I would use a bool flag in struct btf, set it inside the critical
>>>>>>> section at this point with WRITE_ONCE, then test it early in
>>>>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
>>>>>>> condition.
>>>>>>> Using READ_ONCE of course.
>>>>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
>>>>>>> from stale IDs to FDs don't succeed.
>>>>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
>>>>>>> id lookup succeeds, but fd lookup fails.
>>>>>>> So for stale ids it should just skip over their errors if we return
>>>>>>> the same ENOENT as the refcount_inc_not_zero case.
>>>>>>> We will still report stale ids, but not allow their conversion.
>>>>>>> We could filter id lookup too but it will penalize the common code and
>>>>>>> we need extra radix tree lookup just for BTF case.
>>>>>>>
>>>>>>> For me locally, these changes seem to address the problem. I could
>>>>>>> send the fix, but if you would like to take a stab at the above
>>>>>>> change, please go ahead.
>>>>>>>
>>>>>>
>>>>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
>>>>>> likely to return to stale id. However, there is still a small window
>>>>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
>>>>>> creation can still fail.
>>>>>
>>>>> That's fine, it means that module unload is happening in parallel, so
>>>>> it's racy and non-deterministic anyway.
>>>>> When the user controls the module unload though, they should not be
>>>>> getting stale BTF fds once module unload has returned to user space.
>>>>> At that point all associated artifacts shouldn't be discoverable.
>>>>>
>>>>>>
>>>>>> As for the solution, there already is btf_mod->flags. So maybe instead
>>>>>> of adding a bool, just test the flag?
>>>>>
>>>>> You mean struct btf instead of btf_mod? I don't see the flags field there.
>>>>> The btf_mod->flags won't work, it's already deleted and freed by this
>>>>> time, and requires us to go first from the BTF to the btf_mod.
>>>>> So it has to be in struct btf.
>>>>>
>>>>
>>>> I was thinking about updating btf_module->flags when handling
>>>> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
>>>> btf_try_get_module() to check flags if it's module BTF. If the module
>>>> is already freed then it returns -ENOENT.
>>>
>>> Hmm, interesting thought.
>>> I guess the only downside is that it's O(n) lookup under mutex, which
>>> can be costly with lots of modules, and always penalizes
>>> get_fd_by_id() for handling race.
>>> We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
>>> where we may repeat it multiple times in the loop, so the cost is
>>> amplified.
>>>
>>
>> Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
>> directly removing btf->id from btf_idr in btf_module_notify() should
>> stop the discovery.
>
> I looked at this, I think we will need to reset the btf->id too.
> Otherwise while btf_free_id is idempotent (no-op on free id), the
> released id can be reused and if we don't guard btf_free_id in btf_put
> it will remove it for some other BTF initialized in the meantime.
> We can likely set a flag or something to skip it in btf_put(). As long
> as the module is holding a reference nothing else should do it.
>
I wonder if it'd make sense to reduce the cost of btf -> module lookup so that
we could check the flag in contexts like btf_get_fd_by_id()? AI suggested reworking
btf_module lookup from a linked list to a pair of xarrays keyed by struct module *
and struct btf *, and it seems reasonable (testing it now). An easy win might be
to get rid of the strcmp() to vmlinux in btf_is_module(), replacing it with the
simpler predicate
btf->kernel_btf && btf->base_btf;
to reduce overhead of lots of btf_is_module() lookups. What do you think?
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 0:20 ` Martin KaFai Lau
2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 13:03 ` Kumar Kartikeya Dwivedi
2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
` (3 more replies)
1 sibling, 4 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-11 13:03 UTC (permalink / raw)
To: martin.lau
Cc: ameryhung, andrii, ast, bpf, daniel, eddyz87, grbell, haoluo,
john.fastabend, jolsa, kpsingh, memxor, sdf, song, yonghong.song
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
kernel/bpf/btf.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 09fcbb125155..d18e218049ed 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
refcount_inc(&btf->refcnt);
}
+static bool btf_is_module(const struct btf *btf);
+
void btf_put(struct btf *btf)
{
if (btf && refcount_dec_and_test(&btf->refcnt)) {
- btf_free_id(btf);
+ /*
+ * IDR for module BTFs is freed when module's BTF reference is
+ * dropped, which may happen before the final put.
+ */
+ if (!btf_is_module(btf))
+ btf_free_id(btf);
call_rcu(&btf->rcu, btf_free_rcu);
}
}
@@ -8383,6 +8390,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
if (btf_mod->module != module)
continue;
+ /*
+ * For modules, we do the freeing of BTF IDR as soon as
+ * module goes away to disable BTF discovery, since the
+ * btf_try_get_module() on such BTFs will fail.
+ */
+ btf_free_id(btf_mod->btf);
list_del(&btf_mod->list);
if (btf_mod->sysfs_attr)
sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
--
2.52.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops
2026-03-11 12:21 ` Alan Maguire
@ 2026-03-11 13:05 ` Kumar Kartikeya Dwivedi
0 siblings, 0 replies; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-11 13:05 UTC (permalink / raw)
To: Alan Maguire
Cc: Martin KaFai Lau, Amery Hung, Gregory Bell, bpf, ast, daniel,
andrii, eddyz87, song, yonghong.song, john.fastabend, kpsingh,
sdf, haoluo, jolsa
On Wed, 11 Mar 2026 at 13:22, Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 11/03/2026 09:30, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 11 Mar 2026 at 01:20, Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>
> >>
> >> On 3/10/26 4:35 PM, Kumar Kartikeya Dwivedi wrote:
> >>> On Wed, 11 Mar 2026 at 00:17, Amery Hung <ameryhung@gmail.com> wrote:
> >>>>
> >>>> On Tue, Mar 10, 2026 at 3:48 PM Kumar Kartikeya Dwivedi
> >>>> <memxor@gmail.com> wrote:
> >>>>>
> >>>>> On Tue, 10 Mar 2026 at 23:38, Amery Hung <ameryhung@gmail.com> wrote:
> >>>>>>
> >>>>>> On Tue, Mar 10, 2026 at 2:41 PM Kumar Kartikeya Dwivedi
> >>>>>> <memxor@gmail.com> wrote:
> >>>>>>>
> >>>>>>> On Tue, 10 Mar 2026 at 21:21, Gregory Bell <grbell@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>> When running BPF struct_ops tests consecutively, the second test fails
> >>>>>>>> because BTF object references take longer to be released while their
> >>>>>>>> associated module is removed from btf_modules list immediately
> >>>>>>>> during module unload. This creates a race condition where the second
> >>>>>>>> test retrieves the cached BTF object but cannot find valid module
> >>>>>>>> information, causing btf_try_get_module() to return NULL.
> >>>>>>>>
> >>>>>>>> Fix this by modifying MODULE_STATE_GOING handling to wait for
> >>>>>>>> active BTF references to be released before completing module
> >>>>>>>> cleanup. This ensures BTF objects are properly cleaned up from the
> >>>>>>>> cache before their module metadata is removed, eliminating the timing
> >>>>>>>> window that caused the race condition.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Gregory Bell <grbell@redhat.com>
> >>>>>>>> ---
> >>>>>>>> kernel/bpf/btf.c | 7 ++++++-
> >>>>>>>> 1 file changed, 6 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >>>>>>>> index 09fcbb125155..81c9f46a7bb7 100644
> >>>>>>>> --- a/kernel/bpf/btf.c
> >>>>>>>> +++ b/kernel/bpf/btf.c
> >>>>>>>> @@ -8382,7 +8382,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> >>>>>>>> list_for_each_entry_safe(btf_mod, tmp, &btf_modules, list) {
> >>>>>>>> if (btf_mod->module != module)
> >>>>>>>> continue;
> >>>>>>>> -
> >>>>>>>> + unsigned int timeout = 1000;
> >>>>>>>> + while (refcount_read(&btf_mod->btf->refcnt) > 1
> >>>>>>>> + && timeout > 0 ) {
> >>>>>>>> + msleep(1);
> >>>>>>>> + timeout--;
> >>>>>>>> + }
> >>>>>>>
> >>>>>>> This isn't the right way to address such issues. Your diagnosis is
> >>>>>>> correct but fix is just papering over real problems.
> >>>>>>
> >>>>>> Second that the fix is not right.
> >>
> >> pw-bot: cr
> >>
> >>>>>>
> >>>>>>>
> >>>>>>> The better way would be to block the discovery of such BTFs once the
> >>>>>>> module is unloaded.
> >>>>>>> I.e. once we return to userspace after module unload, subsequent calls
> >>>>>>> to obtain an fd for such BTFs should not succeed.
> >>>>>>> If someone is racing to look it up while the unload happens, so be it,
> >>>>>>> they will fail anyway.
> >>>>>>>
> >>>>>>> I would use a bool flag in struct btf, set it inside the critical
> >>>>>>> section at this point with WRITE_ONCE, then test it early in
> >>>>>>> bpf_find_btf_id and bpf_core_find_cands with || in !btf_is_module
> >>>>>>> condition.
> >>>>>>> Using READ_ONCE of course.
> >>>>>>> Also, we need to add it to btf_get_fd_by_id, so that once conversions
> >>>>>>> from stale IDs to FDs don't succeed.
> >>>>>>> Looking at load_module_btfs(), it already handles the ENOENT case when
> >>>>>>> id lookup succeeds, but fd lookup fails.
> >>>>>>> So for stale ids it should just skip over their errors if we return
> >>>>>>> the same ENOENT as the refcount_inc_not_zero case.
> >>>>>>> We will still report stale ids, but not allow their conversion.
> >>>>>>> We could filter id lookup too but it will penalize the common code and
> >>>>>>> we need extra radix tree lookup just for BTF case.
> >>>>>>>
> >>>>>>> For me locally, these changes seem to address the problem. I could
> >>>>>>> send the fix, but if you would like to take a stab at the above
> >>>>>>> change, please go ahead.
> >>>>>>>
> >>>>>>
> >>>>>> I guess this is worth a fix because it makes btf_get_fd_by_id() less
> >>>>>> likely to return to stale id. However, there is still a small window
> >>>>>> between btf_get_fd_by_id() and btf_try_get_module() so struct_ops map
> >>>>>> creation can still fail.
> >>>>>
> >>>>> That's fine, it means that module unload is happening in parallel, so
> >>>>> it's racy and non-deterministic anyway.
> >>>>> When the user controls the module unload though, they should not be
> >>>>> getting stale BTF fds once module unload has returned to user space.
> >>>>> At that point all associated artifacts shouldn't be discoverable.
> >>>>>
> >>>>>>
> >>>>>> As for the solution, there already is btf_mod->flags. So maybe instead
> >>>>>> of adding a bool, just test the flag?
> >>>>>
> >>>>> You mean struct btf instead of btf_mod? I don't see the flags field there.
> >>>>> The btf_mod->flags won't work, it's already deleted and freed by this
> >>>>> time, and requires us to go first from the BTF to the btf_mod.
> >>>>> So it has to be in struct btf.
> >>>>>
> >>>>
> >>>> I was thinking about updating btf_module->flags when handling
> >>>> MODULE_STATE_GOING. Then btf_get_fd_by_id() can call
> >>>> btf_try_get_module() to check flags if it's module BTF. If the module
> >>>> is already freed then it returns -ENOENT.
> >>>
> >>> Hmm, interesting thought.
> >>> I guess the only downside is that it's O(n) lookup under mutex, which
> >>> can be costly with lots of modules, and always penalizes
> >>> get_fd_by_id() for handling race.
> >>> We also need to repeat it in bpf_core_find_cands and bpf_find_btf_id,
> >>> where we may repeat it multiple times in the loop, so the cost is
> >>> amplified.
> >>>
> >>
> >> Is btf->id still needed somewhere after MODULE_STATE_GOING? Otherwise,
> >> directly removing btf->id from btf_idr in btf_module_notify() should
> >> stop the discovery.
> >
> > I looked at this, I think we will need to reset the btf->id too.
> > Otherwise while btf_free_id is idempotent (no-op on free id), the
> > released id can be reused and if we don't guard btf_free_id in btf_put
> > it will remove it for some other BTF initialized in the meantime.
> > We can likely set a flag or something to skip it in btf_put(). As long
> > as the module is holding a reference nothing else should do it.
> >
>
> I wonder if it'd make sense to reduce the cost of btf -> module lookup so that
> we could check the flag in contexts like btf_get_fd_by_id()? AI suggested reworking
> btf_module lookup from a linked list to a pair of xarrays keyed by struct module *
> and struct btf *, and it seems reasonable (testing it now). An easy win might be
> to get rid of the strcmp() to vmlinux in btf_is_module(), replacing it with the
> simpler predicate
>
> btf->kernel_btf && btf->base_btf;
>
> to reduce overhead of lots of btf_is_module() lookups. What do you think?
That is probably worth doing if btf_module lookup is taking up
significant CPU time, but I think as a fix it would be too much churn,
and be harder to backport.
I shared a diff trying out Martin's idea and it seems to fix the issue.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
@ 2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
2026-03-11 16:48 ` Greg Bell
2026-03-11 13:40 ` bot+bpf-ci
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-03-11 13:26 UTC (permalink / raw)
To: martin.lau, grbell
Cc: ameryhung, andrii, ast, bpf, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, sdf, song, yonghong.song
On Wed, 11 Mar 2026 at 14:03, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> kernel/bpf/btf.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 09fcbb125155..d18e218049ed 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
> refcount_inc(&btf->refcnt);
> }
>
> +static bool btf_is_module(const struct btf *btf);
> +
This probably needs to be 'bool btf_is_module(const struct btf *btf);'
, i.e. without static.
I didn't get a warning in my build, strangely enough, but nevertheless.
Gregory, please give it a spin and see if it fixes the bug for you.
Martin, please let me know if this looks ok to you.
> void btf_put(struct btf *btf)
> {
> if (btf && refcount_dec_and_test(&btf->refcnt)) {
> - btf_free_id(btf);
> + /*
> + * IDR for module BTFs is freed when module's BTF reference is
> + * dropped, which may happen before the final put.
> + */
> + if (!btf_is_module(btf))
> + btf_free_id(btf);
> call_rcu(&btf->rcu, btf_free_rcu);
> }
> }
> @@ -8383,6 +8390,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> if (btf_mod->module != module)
> continue;
>
> + /*
> + * For modules, we do the freeing of BTF IDR as soon as
> + * module goes away to disable BTF discovery, since the
> + * btf_try_get_module() on such BTFs will fail.
> + */
> + btf_free_id(btf_mod->btf);
> list_del(&btf_mod->list);
> if (btf_mod->sysfs_attr)
> sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 13:40 ` bot+bpf-ci
2026-03-11 19:10 ` Martin KaFai Lau
2026-03-12 3:08 ` kernel test robot
3 siblings, 0 replies; 21+ messages in thread
From: bot+bpf-ci @ 2026-03-11 13:40 UTC (permalink / raw)
To: memxor, martin.lau
Cc: ameryhung, andrii, ast, bpf, daniel, eddyz87, grbell, haoluo,
john.fastabend, jolsa, kpsingh, memxor, sdf, song, yonghong.song,
ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
> refcount_inc(&btf->refcnt);
> }
>
> +static bool btf_is_module(const struct btf *btf);
btf_is_module() is already declared with external linkage in
include/linux/btf.h:
bool btf_is_module(const struct btf *btf);
and btf.c includes that header. Won't the static forward
declaration here cause a build error? GCC rejects a static
declaration that follows a non-static declaration of the same
function. Dropping the 'static' keyword from this forward
declaration should fix it.
> +
> void btf_put(struct btf *btf)
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22954450342
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
@ 2026-03-11 16:48 ` Greg Bell
0 siblings, 0 replies; 21+ messages in thread
From: Greg Bell @ 2026-03-11 16:48 UTC (permalink / raw)
To: bpf
Thanks for this Kumar! I confirmed that this does fix the bug for us.
>> +static bool btf_is_module(const struct btf *btf);
>> +
>
>This probably needs to be 'bool btf_is_module(const struct btf *btf);'
>, i.e. without static.
>I didn't get a warning in my build, strangely enough, but nevertheless.
For what its worth for me it did throw an error for the static
declaration and I had to drop the static keyword:
kernel/bpf/btf.c:1886:13: error: static declaration of ‘btf_is_module’
follows non-static declaration
1886 | static bool btf_is_module(const struct btf *btf);
| ^~~~~~~~~~~~~
In file included from ./include/linux/bpf.h:29,
from ./include/linux/bpf_verifier.h:7,
from kernel/bpf/btf.c:19:
./include/linux/btf.h:219:6: note: previous declaration of
‘btf_is_module’ with type ‘bool(const struct btf *)’ {aka ‘_Bool(co
nst struct btf *)’}
219 | bool btf_is_module(const struct btf *btf);
| ^~~~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:289: kernel/bpf/btf.o] Error 1
On Wed, Mar 11, 2026 at 9:26 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, 11 Mar 2026 at 14:03, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > kernel/bpf/btf.c | 15 ++++++++++++++-
> > 1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 09fcbb125155..d18e218049ed 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
> > refcount_inc(&btf->refcnt);
> > }
> >
> > +static bool btf_is_module(const struct btf *btf);
> > +
>
> This probably needs to be 'bool btf_is_module(const struct btf *btf);'
> , i.e. without static.
> I didn't get a warning in my build, strangely enough, but nevertheless.
>
> Gregory, please give it a spin and see if it fixes the bug for you.
> Martin, please let me know if this looks ok to you.
>
> > void btf_put(struct btf *btf)
> > {
> > if (btf && refcount_dec_and_test(&btf->refcnt)) {
> > - btf_free_id(btf);
> > + /*
> > + * IDR for module BTFs is freed when module's BTF reference is
> > + * dropped, which may happen before the final put.
> > + */
> > + if (!btf_is_module(btf))
> > + btf_free_id(btf);
> > call_rcu(&btf->rcu, btf_free_rcu);
> > }
> > }
> > @@ -8383,6 +8390,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > if (btf_mod->module != module)
> > continue;
> >
> > + /*
> > + * For modules, we do the freeing of BTF IDR as soon as
> > + * module goes away to disable BTF discovery, since the
> > + * btf_try_get_module() on such BTFs will fail.
> > + */
> > + btf_free_id(btf_mod->btf);
> > list_del(&btf_mod->list);
> > if (btf_mod->sysfs_attr)
> > sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
> > --
> > 2.52.0
> >
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
2026-03-11 13:40 ` bot+bpf-ci
@ 2026-03-11 19:10 ` Martin KaFai Lau
2026-03-11 19:17 ` Andrii Nakryiko
2026-03-12 3:08 ` kernel test robot
3 siblings, 1 reply; 21+ messages in thread
From: Martin KaFai Lau @ 2026-03-11 19:10 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi
Cc: ameryhung, andrii, ast, bpf, daniel, eddyz87, grbell, haoluo,
john.fastabend, jolsa, kpsingh, sdf, song, yonghong.song
On 3/11/26 6:03 AM, Kumar Kartikeya Dwivedi wrote:
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
> kernel/bpf/btf.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 09fcbb125155..d18e218049ed 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
> refcount_inc(&btf->refcnt);
> }
>
> +static bool btf_is_module(const struct btf *btf);
> +
> void btf_put(struct btf *btf)
> {
> if (btf && refcount_dec_and_test(&btf->refcnt)) {
> - btf_free_id(btf);
> + /*
> + * IDR for module BTFs is freed when module's BTF reference is
> + * dropped, which may happen before the final put.
> + */
> + if (!btf_is_module(btf))
> + btf_free_id(btf);
> call_rcu(&btf->rcu, btf_free_rcu);
> }
> }
> @@ -8383,6 +8390,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> if (btf_mod->module != module)
> continue;
>
> + /*
> + * For modules, we do the freeing of BTF IDR as soon as
> + * module goes away to disable BTF discovery, since the
> + * btf_try_get_module() on such BTFs will fail.
> + */
> + btf_free_id(btf_mod->btf);
> list_del(&btf_mod->list);
The change makes sense.
I looked up a bit because I am interested in who will use the module's
btf->id without holding the module's refcount. fwiw, here is what I read.
For non struct_ops prog, the module refcount is held in prog->aux->mod
after commit 31bf1dbccfb0, so should be fine on the
bpf_trampoline_compute_key() and bpf_prog_get_info_by_fd().
For struct_ops prog, the module refcount is not held in prog->aux->mod.
I don't think it has to since it is not directly attached to the module.
The only downside is the bpf_prog_get_info_by_fd() may report an
out-dated info.attach_btf_obj_id. If it is an issue, maybe just hold the
module refcount also.
For struct_ops map, it attaches the struct_ops prog to the module, so
the module refcount is held. It should be fine.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 19:10 ` Martin KaFai Lau
@ 2026-03-11 19:17 ` Andrii Nakryiko
2026-03-11 19:55 ` Martin KaFai Lau
0 siblings, 1 reply; 21+ messages in thread
From: Andrii Nakryiko @ 2026-03-11 19:17 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Kumar Kartikeya Dwivedi, ameryhung, andrii, ast, bpf, daniel,
eddyz87, grbell, haoluo, john.fastabend, jolsa, kpsingh, sdf,
song, yonghong.song
On Wed, Mar 11, 2026 at 12:10 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
>
>
> On 3/11/26 6:03 AM, Kumar Kartikeya Dwivedi wrote:
Commit message is missing?...
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> > kernel/bpf/btf.c | 15 ++++++++++++++-
> > 1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 09fcbb125155..d18e218049ed 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -1883,10 +1883,17 @@ void btf_get(struct btf *btf)
> > refcount_inc(&btf->refcnt);
> > }
> >
> > +static bool btf_is_module(const struct btf *btf);
> > +
> > void btf_put(struct btf *btf)
> > {
> > if (btf && refcount_dec_and_test(&btf->refcnt)) {
> > - btf_free_id(btf);
> > + /*
> > + * IDR for module BTFs is freed when module's BTF reference is
> > + * dropped, which may happen before the final put.
> > + */
> > + if (!btf_is_module(btf))
> > + btf_free_id(btf);
> > call_rcu(&btf->rcu, btf_free_rcu);
> > }
> > }
> > @@ -8383,6 +8390,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> > if (btf_mod->module != module)
> > continue;
> >
> > + /*
> > + * For modules, we do the freeing of BTF IDR as soon as
> > + * module goes away to disable BTF discovery, since the
> > + * btf_try_get_module() on such BTFs will fail.
> > + */
> > + btf_free_id(btf_mod->btf);
> > list_del(&btf_mod->list);
>
> The change makes sense.
>
> I looked up a bit because I am interested in who will use the module's
> btf->id without holding the module's refcount. fwiw, here is what I read.
>
> For non struct_ops prog, the module refcount is held in prog->aux->mod
> after commit 31bf1dbccfb0, so should be fine on the
> bpf_trampoline_compute_key() and bpf_prog_get_info_by_fd().
>
> For struct_ops prog, the module refcount is not held in prog->aux->mod.
> I don't think it has to since it is not directly attached to the module.
If we have something referencing a module, we should probably get and
keep refcnt on it, no? Why would we have a pointer to a module we
cannot be sure is still alive when we access it, sounds too
error-prone to me.
> The only downside is the bpf_prog_get_info_by_fd() may report an
> out-dated info.attach_btf_obj_id. If it is an issue, maybe just hold the
> module refcount also.
>
> For struct_ops map, it attaches the struct_ops prog to the module, so
> the module refcount is held. It should be fine.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 19:17 ` Andrii Nakryiko
@ 2026-03-11 19:55 ` Martin KaFai Lau
0 siblings, 0 replies; 21+ messages in thread
From: Martin KaFai Lau @ 2026-03-11 19:55 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Kumar Kartikeya Dwivedi, ameryhung, andrii, ast, bpf, daniel,
eddyz87, grbell, haoluo, john.fastabend, jolsa, kpsingh, sdf,
song, yonghong.song
On 3/11/26 12:17 PM, Andrii Nakryiko wrote:
>> For struct_ops prog, the module refcount is not held in prog->aux->mod.
>> I don't think it has to since it is not directly attached to the module.
>
> If we have something referencing a module, we should probably get and
> keep refcnt on it, no? Why would we have a pointer to a module we
> cannot be sure is still alive when we access it, sounds too
> error-prone to me.
>
>> For struct_ops map, it attaches the struct_ops prog to the module, so
>> the module refcount is held. It should be fine.
The module count is held but it is held by the struct_ops map. The
struct_ops prog doesn't use the module and doesn't have a pointer to a
module after it is loaded.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] bpf: Release module BTF IDR before module unload
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
` (2 preceding siblings ...)
2026-03-11 19:10 ` Martin KaFai Lau
@ 2026-03-12 3:08 ` kernel test robot
3 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2026-03-12 3:08 UTC (permalink / raw)
To: Kumar Kartikeya Dwivedi, martin.lau
Cc: oe-kbuild-all, ameryhung, andrii, ast, bpf, daniel, eddyz87,
grbell, haoluo, john.fastabend, jolsa, kpsingh, memxor, sdf, song,
yonghong.song
Hi Kumar,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/net]
[also build test ERROR on bpf-next/master bpf/master linus/master v6.16-rc1 next-20260311]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/bpf-Release-module-BTF-IDR-before-module-unload/20260311-211036
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git net
patch link: https://lore.kernel.org/r/20260311130321.1055808-1-memxor%40gmail.com
patch subject: [PATCH] bpf: Release module BTF IDR before module unload
config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20260312/202603120435.9bfX2C28-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260312/202603120435.9bfX2C28-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603120435.9bfX2C28-lkp@intel.com/
All errors (new ones prefixed by >>):
>> kernel/bpf/btf.c:1886:13: error: static declaration of 'btf_is_module' follows non-static declaration
1886 | static bool btf_is_module(const struct btf *btf);
| ^~~~~~~~~~~~~
In file included from include/linux/bpf.h:29,
from include/linux/bpf_verifier.h:7,
from kernel/bpf/btf.c:19:
include/linux/btf.h:219:6: note: previous declaration of 'btf_is_module' with type 'bool(const struct btf *)' {aka '_Bool(const struct btf *)'}
219 | bool btf_is_module(const struct btf *btf);
| ^~~~~~~~~~~~~
vim +/btf_is_module +1886 kernel/bpf/btf.c
1885
> 1886 static bool btf_is_module(const struct btf *btf);
1887
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-03-12 3:09 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 20:21 [PATCH bpf-next 0/1] Fix BPF struct_ops BTF cleanup race condition Gregory Bell
2026-03-10 20:21 ` [PATCH bpf-next 1/1] bpf: Fix BTF module cleanup race condition in struct_ops Gregory Bell
2026-03-10 20:58 ` bot+bpf-ci
2026-03-10 21:39 ` Kumar Kartikeya Dwivedi
2026-03-10 22:38 ` Amery Hung
2026-03-10 22:47 ` Kumar Kartikeya Dwivedi
2026-03-10 23:17 ` Amery Hung
2026-03-10 23:35 ` Kumar Kartikeya Dwivedi
2026-03-11 0:20 ` Martin KaFai Lau
2026-03-11 9:30 ` Kumar Kartikeya Dwivedi
2026-03-11 9:34 ` Kumar Kartikeya Dwivedi
2026-03-11 12:21 ` Alan Maguire
2026-03-11 13:05 ` Kumar Kartikeya Dwivedi
2026-03-11 13:03 ` [PATCH] bpf: Release module BTF IDR before module unload Kumar Kartikeya Dwivedi
2026-03-11 13:26 ` Kumar Kartikeya Dwivedi
2026-03-11 16:48 ` Greg Bell
2026-03-11 13:40 ` bot+bpf-ci
2026-03-11 19:10 ` Martin KaFai Lau
2026-03-11 19:17 ` Andrii Nakryiko
2026-03-11 19:55 ` Martin KaFai Lau
2026-03-12 3:08 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox