* [PATCH bpf-next v5 0/2] bpf: Free special fields when update [lru_,]percpu_hash maps
@ 2025-11-04 14:27 Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 1/2] " Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the " Leon Hwang
0 siblings, 2 replies; 6+ messages in thread
From: Leon Hwang @ 2025-11-04 14:27 UTC (permalink / raw)
To: bpf
Cc: ast, andrii, daniel, martin.lau, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, memxor, ameryhung,
linux-kernel, kernel-patches-bot, Leon Hwang
In the discussion thread
"[PATCH bpf-next v9 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps"[1],
it was pointed out that missing calls to bpf_obj_free_fields() could
lead to memory leaks.
A selftest was added to confirm that this is indeed a real issue - the
refcount of BPF_KPTR_REF field is not decremented when
bpf_obj_free_fields() is missing after copy_map_value[,_long]().
Further inspection of copy_map_value[,_long]() call sites revealed two
locations affected by this issue:
1. pcpu_copy_value()
2. htab_map_update_elem() when used with BPF_F_LOCK
Similar case happens when update local storage maps with BPF_F_LOCK.
This series fixes the cases where BPF_F_LOCK is not involved by
properly calling bpf_obj_free_fields() after copy_map_value[,_long](),
and adds a selftest to verify the fix.
The remaining cases involving BPF_F_LOCK will be addressed in a
separate patch set after the series
"bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps"
is applied.
Changes:
v4 -> v5:
* Use a local variable to store the this_cpu_ptr()/per_cpu_ptr() result,
and reuse it between copy_map_value[,_long]() and
bpf_obj_free_fields() in patch #1 (per Andrii).
* Drop patch #2 and #3, because the combination of BPF_F_LOCK with other
special fields (except for BPF_SPIN_LOCK) will be disallowed on the
UAPI side in the future (per Alexei).
* v4: https://lore.kernel.org/bpf/20251030152451.62778-1-leon.hwang@linux.dev/
v3 -> v4:
* Target bpf-next tree.
* Address comments from Amery:
* Drop 'bpf_obj_free_fields()' in the path of updating local storage
maps without BPF_F_LOCK.
* Drop the corresponding self test.
* Respin the other test of local storage maps using syscall BPF
programs.
* v3: https://lore.kernel.org/bpf/20251026154000.34151-1-leon.hwang@linux.dev/
v2 -> v3:
* Free special fields when update local storage maps without BPF_F_LOCK.
* Add test to verify decrementing refcount when update cgroup local
storage maps without BPF_F_LOCK.
* Address review from AI bot:
* Slow path with BPF_F_LOCK (around line 642-646) in
'bpf_local_storage.c'.
* v2: https://lore.kernel.org/bpf/20251020164608.20536-1-leon.hwang@linux.dev/
v1 -> v2:
* Add test to verify decrementing refcount when update cgroup local
storage maps with BPF_F_LOCK.
* Address review from AI bot:
* Fast path without bucket lock (around line 610) in
'bpf_local_storage.c'.
* v1: https://lore.kernel.org/bpf/20251016145801.47552-1-leon.hwang@linux.dev/
Leon Hwang (2):
bpf: Free special fields when update [lru_,]percpu_hash maps
selftests/bpf: Add test to verify freeing the special fields when
update [lru_,]percpu_hash maps
kernel/bpf/hashtab.c | 10 +++-
.../bpf/prog_tests/refcounted_kptr.c | 57 ++++++++++++++++++
.../selftests/bpf/progs/refcounted_kptr.c | 60 +++++++++++++++++++
3 files changed, 125 insertions(+), 2 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH bpf-next v5 1/2] bpf: Free special fields when update [lru_,]percpu_hash maps
2025-11-04 14:27 [PATCH bpf-next v5 0/2] bpf: Free special fields when update [lru_,]percpu_hash maps Leon Hwang
@ 2025-11-04 14:27 ` Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the " Leon Hwang
1 sibling, 0 replies; 6+ messages in thread
From: Leon Hwang @ 2025-11-04 14:27 UTC (permalink / raw)
To: bpf
Cc: ast, andrii, daniel, martin.lau, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, memxor, ameryhung,
linux-kernel, kernel-patches-bot, Leon Hwang
As [lru_,]percpu_hash maps support BPF_KPTR_{REF,PERCPU}, missing
calls to 'bpf_obj_free_fields()' in 'pcpu_copy_value()' could cause the
memory referenced by BPF_KPTR_{REF,PERCPU} fields to be held until the
map gets freed.
Fix this by calling 'bpf_obj_free_fields()' after
'copy_map_value[,_long]()' in 'pcpu_copy_value()'.
Fixes: 65334e64a493 ("bpf: Support kptrs in percpu hashmap and percpu LRU hashmap")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
kernel/bpf/hashtab.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index f876f09355f0d..c8a9b27f8663b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -934,15 +934,21 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
void *value, bool onallcpus)
{
+ void *ptr;
+
if (!onallcpus) {
/* copy true value_size bytes */
- copy_map_value(&htab->map, this_cpu_ptr(pptr), value);
+ ptr = this_cpu_ptr(pptr);
+ copy_map_value(&htab->map, ptr, value);
+ bpf_obj_free_fields(htab->map.record, ptr);
} else {
u32 size = round_up(htab->map.value_size, 8);
int off = 0, cpu;
for_each_possible_cpu(cpu) {
- copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value + off);
+ ptr = per_cpu_ptr(pptr, cpu);
+ copy_map_value_long(&htab->map, ptr, value + off);
+ bpf_obj_free_fields(htab->map.record, ptr);
off += size;
}
}
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the special fields when update [lru_,]percpu_hash maps
2025-11-04 14:27 [PATCH bpf-next v5 0/2] bpf: Free special fields when update [lru_,]percpu_hash maps Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 1/2] " Leon Hwang
@ 2025-11-04 14:27 ` Leon Hwang
2025-11-04 14:52 ` bot+bpf-ci
1 sibling, 1 reply; 6+ messages in thread
From: Leon Hwang @ 2025-11-04 14:27 UTC (permalink / raw)
To: bpf
Cc: ast, andrii, daniel, martin.lau, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, memxor, ameryhung,
linux-kernel, kernel-patches-bot, Leon Hwang
Add test to verify that updating [lru_,]percpu_hash maps decrements
refcount when BPF_KPTR_REF objects are involved.
The tests perform the following steps:
1. Call update_elem() to insert an initial value.
2. Use bpf_refcount_acquire() to increment the refcount.
3. Store the node pointer in the map value.
4. Add the node to a linked list.
5. Probe-read the refcount and verify it is *2*.
6. Call update_elem() again to trigger refcount decrement.
7. Probe-read the refcount and verify it is *1*.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
.../bpf/prog_tests/refcounted_kptr.c | 57 ++++++++++++++++++
.../selftests/bpf/progs/refcounted_kptr.c | 60 +++++++++++++++++++
2 files changed, 117 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index d6bd5e16e6372..b95551768d27b 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -44,3 +44,60 @@ void test_refcounted_kptr_wrong_owner(void)
ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a2 retval");
refcounted_kptr__destroy(skel);
}
+
+void test_percpu_hash_kptr_refcount_leak(void)
+{
+ struct refcounted_kptr *skel;
+ int cpu_nr, fd, err, key = 0;
+ struct bpf_map *map;
+ size_t values_sz;
+ u64 *values;
+ LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .data_in = &pkt_v4,
+ .data_size_in = sizeof(pkt_v4),
+ .repeat = 1,
+ );
+
+ cpu_nr = libbpf_num_possible_cpus();
+ if (!ASSERT_GT(cpu_nr, 0, "libbpf_num_possible_cpus"))
+ return;
+
+ values = calloc(cpu_nr, sizeof(u64));
+ if (!ASSERT_OK_PTR(values, "calloc values"))
+ return;
+
+ skel = refcounted_kptr__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load")) {
+ free(values);
+ return;
+ }
+
+ values_sz = cpu_nr * sizeof(u64);
+ memset(values, 0, values_sz);
+
+ map = skel->maps.percpu_hash;
+ err = bpf_map__update_elem(map, &key, sizeof(key), values, values_sz, 0);
+ if (!ASSERT_OK(err, "bpf_map__update_elem"))
+ goto out;
+
+ fd = bpf_program__fd(skel->progs.percpu_hash_refcount_leak);
+ err = bpf_prog_test_run_opts(fd, &opts);
+ if (!ASSERT_OK(err, "bpf_prog_test_run_opts"))
+ goto out;
+ if (!ASSERT_EQ(opts.retval, 2, "opts.retval"))
+ goto out;
+
+ err = bpf_map__update_elem(map, &key, sizeof(key), values, values_sz, 0);
+ if (!ASSERT_OK(err, "bpf_map__update_elem"))
+ goto out;
+
+ fd = bpf_program__fd(skel->progs.check_percpu_hash_refcount);
+ err = bpf_prog_test_run_opts(fd, &opts);
+ ASSERT_OK(err, "bpf_prog_test_run_opts");
+ ASSERT_EQ(opts.retval, 1, "opts.retval");
+
+out:
+ refcounted_kptr__destroy(skel);
+ free(values);
+}
+
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index 893a4fdb4b6e9..87b0cc018f840 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -568,4 +568,64 @@ int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
return 0;
}
+private(kptr_ref) u64 ref;
+
+static int probe_read_refcount(void)
+{
+ u32 refcount;
+
+ bpf_probe_read_kernel(&refcount, sizeof(refcount), (void *) ref);
+ return refcount;
+}
+
+static int __insert_in_list(struct bpf_list_head *head, struct bpf_spin_lock *lock,
+ struct node_data __kptr **node)
+{
+ struct node_data *n, *m;
+
+ n = bpf_obj_new(typeof(*n));
+ if (!n)
+ return -1;
+
+ m = bpf_refcount_acquire(n);
+ n = bpf_kptr_xchg(node, n);
+ if (n) {
+ bpf_obj_drop(n);
+ bpf_obj_drop(m);
+ return -2;
+ }
+
+ bpf_spin_lock(lock);
+ bpf_list_push_front(head, &m->l);
+ ref = (u64)(void *) &m->ref;
+ bpf_spin_unlock(lock);
+ return probe_read_refcount();
+}
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
+ __type(key, int);
+ __type(value, struct map_value);
+ __uint(max_entries, 1);
+} percpu_hash SEC(".maps");
+
+SEC("tc")
+int percpu_hash_refcount_leak(void *ctx)
+{
+ struct map_value *v;
+ int key = 0;
+
+ v = bpf_map_lookup_elem(&percpu_hash, &key);
+ if (!v)
+ return 0;
+
+ return __insert_in_list(&head, &lock, &v->node);
+}
+
+SEC("tc")
+int check_percpu_hash_refcount(void *ctx)
+{
+ return probe_read_refcount();
+}
+
char _license[] SEC("license") = "GPL";
--
2.51.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the special fields when update [lru_,]percpu_hash maps
2025-11-04 14:27 ` [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the " Leon Hwang
@ 2025-11-04 14:52 ` bot+bpf-ci
2025-11-04 17:37 ` Alexei Starovoitov
0 siblings, 1 reply; 6+ messages in thread
From: bot+bpf-ci @ 2025-11-04 14:52 UTC (permalink / raw)
To: leon.hwang, bpf
Cc: ast, andrii, daniel, martin.lau, eddyz87, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, jolsa, memxor, ameryhung,
linux-kernel, kernel-patches-bot, leon.hwang, ast, andrii, daniel,
martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]
> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
> index 893a4fdb4..87b0cc018 100644
> --- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
> +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
[ ... ]
> @@ -568,4 +568,64 @@ int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
> return 0;
> }
>
> +private(kptr_ref) u64 ref;
> +
> +static int probe_read_refcount(void)
> +{
> + u32 refcount;
> +
> + bpf_probe_read_kernel(&refcount, sizeof(refcount), (void *) ref);
> + return refcount;
> +}
> +
> +static int __insert_in_list(struct bpf_list_head *head, struct bpf_spin_lock *lock,
> + struct node_data __kptr **node)
> +{
> + struct node_data *n, *m;
> +
> + n = bpf_obj_new(typeof(*n));
> + if (!n)
> + return -1;
> +
> + m = bpf_refcount_acquire(n);
> + n = bpf_kptr_xchg(node, n);
> + if (n) {
> + bpf_obj_drop(n);
> + bpf_obj_drop(m);
> + return -2;
> + }
In __insert_in_list(), after bpf_kptr_xchg() stores the new object in
the map and returns the old value in n, can the error path drop both
n and m? At this point, the new object (pointed to by m) is already
referenced by the map. Dropping m here would free an object that the
map still points to, leaving a dangling pointer.
The test itself never triggers this path since it initializes the map
with zeros first, so the kptr field starts as NULL. However, if this
test were extended or this pattern copied elsewhere, the incorrect
error handling could lead to a use-after-free.
> +
> + bpf_spin_lock(lock);
> + bpf_list_push_front(head, &m->l);
> + ref = (u64)(void *) &m->ref;
> + bpf_spin_unlock(lock);
> + return probe_read_refcount();
> +}
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19072261328
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the special fields when update [lru_,]percpu_hash maps
2025-11-04 14:52 ` bot+bpf-ci
@ 2025-11-04 17:37 ` Alexei Starovoitov
2025-11-05 14:09 ` Leon Hwang
0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2025-11-04 17:37 UTC (permalink / raw)
To: bot+bpf-ci
Cc: Leon Hwang, bpf, Alexei Starovoitov, Andrii Nakryiko,
Daniel Borkmann, Martin KaFai Lau, Eduard, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Kumar Kartikeya Dwivedi, Amery Hung, LKML,
kernel-patches-bot, Martin KaFai Lau, Chris Mason, Ihor Solodrai
On Tue, Nov 4, 2025 at 6:52 AM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
> > index 893a4fdb4..87b0cc018 100644
> > --- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
> > +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>
> [ ... ]
>
> > @@ -568,4 +568,64 @@ int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
> > return 0;
> > }
> >
> > +private(kptr_ref) u64 ref;
> > +
> > +static int probe_read_refcount(void)
> > +{
> > + u32 refcount;
> > +
> > + bpf_probe_read_kernel(&refcount, sizeof(refcount), (void *) ref);
> > + return refcount;
> > +}
> > +
> > +static int __insert_in_list(struct bpf_list_head *head, struct bpf_spin_lock *lock,
> > + struct node_data __kptr **node)
> > +{
> > + struct node_data *n, *m;
> > +
> > + n = bpf_obj_new(typeof(*n));
> > + if (!n)
> > + return -1;
> > +
> > + m = bpf_refcount_acquire(n);
> > + n = bpf_kptr_xchg(node, n);
> > + if (n) {
> > + bpf_obj_drop(n);
> > + bpf_obj_drop(m);
> > + return -2;
> > + }
>
> In __insert_in_list(), after bpf_kptr_xchg() stores the new object in
> the map and returns the old value in n, can the error path drop both
> n and m? At this point, the new object (pointed to by m) is already
> referenced by the map. Dropping m here would free an object that the
> map still points to, leaving a dangling pointer.
AI is wrong, but I bet it got confused by reuse of variable 'n'.
It's hard for humans too.
Leon,
please use a different var.
n = bpf_kptr_xchg(node, n); is a head scratcher.
Also see Yonghong's comment on v4 which I suspect applies to v5.
pw-bot: cr
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the special fields when update [lru_,]percpu_hash maps
2025-11-04 17:37 ` Alexei Starovoitov
@ 2025-11-05 14:09 ` Leon Hwang
0 siblings, 0 replies; 6+ messages in thread
From: Leon Hwang @ 2025-11-05 14:09 UTC (permalink / raw)
To: Alexei Starovoitov, bot+bpf-ci
Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Kumar Kartikeya Dwivedi, Amery Hung, LKML, kernel-patches-bot,
Martin KaFai Lau, Chris Mason, Ihor Solodrai
On 2025/11/5 01:37, Alexei Starovoitov wrote:
> On Tue, Nov 4, 2025 at 6:52 AM <bot+bpf-ci@kernel.org> wrote:
>>
>>> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>>> index 893a4fdb4..87b0cc018 100644
>>> --- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>>> +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>>
>> [ ... ]
>>
>>> @@ -568,4 +568,64 @@ int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
>>> return 0;
>>> }
>>>
>>> +private(kptr_ref) u64 ref;
>>> +
>>> +static int probe_read_refcount(void)
>>> +{
>>> + u32 refcount;
>>> +
>>> + bpf_probe_read_kernel(&refcount, sizeof(refcount), (void *) ref);
>>> + return refcount;
>>> +}
>>> +
>>> +static int __insert_in_list(struct bpf_list_head *head, struct bpf_spin_lock *lock,
>>> + struct node_data __kptr **node)
>>> +{
>>> + struct node_data *n, *m;
>>> +
>>> + n = bpf_obj_new(typeof(*n));
>>> + if (!n)
>>> + return -1;
>>> +
>>> + m = bpf_refcount_acquire(n);
>>> + n = bpf_kptr_xchg(node, n);
>>> + if (n) {
>>> + bpf_obj_drop(n);
>>> + bpf_obj_drop(m);
>>> + return -2;
>>> + }
>>
>> In __insert_in_list(), after bpf_kptr_xchg() stores the new object in
>> the map and returns the old value in n, can the error path drop both
>> n and m? At this point, the new object (pointed to by m) is already
>> referenced by the map. Dropping m here would free an object that the
>> map still points to, leaving a dangling pointer.
>
> AI is wrong, but I bet it got confused by reuse of variable 'n'.
> It's hard for humans too.
> Leon,
> please use a different var.
> n = bpf_kptr_xchg(node, n); is a head scratcher.
No problem.
I'll update the variable names in the next revision.
>
> Also see Yonghong's comment on v4 which I suspect applies to v5.
That was actually a misunderstanding — he didn't run the newly added tests.
Still, I'll update the test name to include "refcounted_kptr" to make it
clearer and help avoid such confusion in the future.
Thanks,
Leon
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-05 14:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 14:27 [PATCH bpf-next v5 0/2] bpf: Free special fields when update [lru_,]percpu_hash maps Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 1/2] " Leon Hwang
2025-11-04 14:27 ` [PATCH bpf-next v5 2/2] selftests/bpf: Add test to verify freeing the " Leon Hwang
2025-11-04 14:52 ` bot+bpf-ci
2025-11-04 17:37 ` Alexei Starovoitov
2025-11-05 14:09 ` Leon Hwang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox