* [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
@ 2025-07-29 16:56 Arnaud Lecomte
2025-07-29 22:45 ` Yonghong Song
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-07-29 16:56 UTC (permalink / raw)
To: song, jolsa, ast, daniel, andrii, martin.lau, eddyz87,
yonghong.song, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b,
Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
For build_id mode, we use sizeof(struct bpf_stack_build_id)
to determine capacity, and for normal mode we use sizeof(u64).
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
Changes in v2:
- Use utilty stack_map_data_size to compute map stack map size
---
kernel/bpf/stackmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..6f225d477f07 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
- u32 hash, id, trace_nr, trace_len, i;
+ u32 hash, id, trace_nr, trace_len, i, max_depth;
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
@@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+
+ /* Clamp the trace to max allowed depth */
+ max_depth = smap->map.value_size / stack_map_data_size(map);
+ if (trace_nr > max_depth)
+ trace_nr = max_depth;
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-07-29 16:56 [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-07-29 22:45 ` Yonghong Song
2025-07-30 7:10 ` Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-07-29 22:45 UTC (permalink / raw)
To: Arnaud Lecomte, song, jolsa, ast, daniel, andrii, martin.lau,
eddyz87, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b
On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
> For build_id mode, we use sizeof(struct bpf_stack_build_id)
> to determine capacity, and for normal mode we use sizeof(u64).
>
> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
Could you add a selftest? This way folks can easily find out what is
the problem and why this fix solves the issue correctly.
> ---
> Changes in v2:
> - Use utilty stack_map_data_size to compute map stack map size
> ---
> kernel/bpf/stackmap.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..6f225d477f07 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
> struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> - u32 hash, id, trace_nr, trace_len, i;
> + u32 hash, id, trace_nr, trace_len, i, max_depth;
> bool user = flags & BPF_F_USER_STACK;
> u64 *ips;
> bool hash_matches;
> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
>
> trace_nr = trace->nr - skip;
> trace_len = trace_nr * sizeof(u64);
> +
> + /* Clamp the trace to max allowed depth */
> + max_depth = smap->map.value_size / stack_map_data_size(map);
> + if (trace_nr > max_depth)
> + trace_nr = max_depth;
> +
> ips = trace->ip + skip;
> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
> id = hash & (smap->n_buckets - 1);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-07-29 22:45 ` Yonghong Song
@ 2025-07-30 7:10 ` Arnaud Lecomte
2025-08-01 18:16 ` Lecomte, Arnaud
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-07-30 7:10 UTC (permalink / raw)
To: Yonghong Song, song, jolsa, ast, daniel, andrii, martin.lau,
eddyz87, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b
On 29/07/2025 23:45, Yonghong Song wrote:
>
>
> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>> contains more stack entries than the stack map bucket can hold,
>> leading to an out-of-bounds write in the bucket's data array.
>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>> to determine capacity, and for normal mode we use sizeof(u64).
>>
>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>
> Could you add a selftest? This way folks can easily find out what is
> the problem and why this fix solves the issue correctly.
>
Sure, will be done after work
Thanks,
Arnaud
>> ---
>> Changes in v2:
>> - Use utilty stack_map_data_size to compute map stack map size
>> ---
>> kernel/bpf/stackmap.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 3615c06b7dfa..6f225d477f07 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
>> struct bpf_stack_map *smap = container_of(map, struct
>> bpf_stack_map, map);
>> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> - u32 hash, id, trace_nr, trace_len, i;
>> + u32 hash, id, trace_nr, trace_len, i, max_depth;
>> bool user = flags & BPF_F_USER_STACK;
>> u64 *ips;
>> bool hash_matches;
>> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
>> trace_nr = trace->nr - skip;
>> trace_len = trace_nr * sizeof(u64);
>> +
>> + /* Clamp the trace to max allowed depth */
>> + max_depth = smap->map.value_size / stack_map_data_size(map);
>> + if (trace_nr > max_depth)
>> + trace_nr = max_depth;
>> +
>> ips = trace->ip + skip;
>> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
>> id = hash & (smap->n_buckets - 1);
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-07-30 7:10 ` Arnaud Lecomte
@ 2025-08-01 18:16 ` Lecomte, Arnaud
2025-08-05 20:49 ` Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Lecomte, Arnaud @ 2025-08-01 18:16 UTC (permalink / raw)
To: Yonghong Song, song, jolsa, ast, daniel, andrii, martin.lau,
eddyz87, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b
Well, it turns out it is less straightforward than it looked like to
detect the memory corruption
without KASAN. I am currently in holidays for the next 3 days so I've
limited access to a
computer. I should be able to sort this out on monday.
Thanks,
Arnaud
On 30/07/2025 08:10, Arnaud Lecomte wrote:
> On 29/07/2025 23:45, Yonghong Song wrote:
>>
>>
>> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>>> Syzkaller reported a KASAN slab-out-of-bounds write in
>>> __bpf_get_stackid()
>>> when copying stack trace data. The issue occurs when the perf trace
>>> contains more stack entries than the stack map bucket can hold,
>>> leading to an out-of-bounds write in the bucket's data array.
>>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>>> to determine capacity, and for normal mode we use sizeof(u64).
>>>
>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>>> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>>
>> Could you add a selftest? This way folks can easily find out what is
>> the problem and why this fix solves the issue correctly.
>>
> Sure, will be done after work
> Thanks,
> Arnaud
>>> ---
>>> Changes in v2:
>>> - Use utilty stack_map_data_size to compute map stack map size
>>> ---
>>> kernel/bpf/stackmap.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>>> index 3615c06b7dfa..6f225d477f07 100644
>>> --- a/kernel/bpf/stackmap.c
>>> +++ b/kernel/bpf/stackmap.c
>>> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
>>> struct bpf_stack_map *smap = container_of(map, struct
>>> bpf_stack_map, map);
>>> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
>>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>>> - u32 hash, id, trace_nr, trace_len, i;
>>> + u32 hash, id, trace_nr, trace_len, i, max_depth;
>>> bool user = flags & BPF_F_USER_STACK;
>>> u64 *ips;
>>> bool hash_matches;
>>> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map *map,
>>> trace_nr = trace->nr - skip;
>>> trace_len = trace_nr * sizeof(u64);
>>> +
>>> + /* Clamp the trace to max allowed depth */
>>> + max_depth = smap->map.value_size / stack_map_data_size(map);
>>> + if (trace_nr > max_depth)
>>> + trace_nr = max_depth;
>>> +
>>> ips = trace->ip + skip;
>>> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
>>> id = hash & (smap->n_buckets - 1);
>>
>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-01 18:16 ` Lecomte, Arnaud
@ 2025-08-05 20:49 ` Arnaud Lecomte
2025-08-06 1:52 ` Yonghong Song
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-05 20:49 UTC (permalink / raw)
To: Yonghong Song, song, jolsa, ast, daniel, andrii, martin.lau,
eddyz87, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b
Hi,
I gave it several tries and I can't find a nice to do see properly.
The main challenge is to find a way to detect memory corruption. I
wanted to place a canary value
by tweaking the map size but we don't have a way from a BPF program
perspective to access to the size
of a stack_map_bucket. If we decide to do this computation manually, we
would end-up with maintainability
issues:
#include "vmlinux.h"
#include "bpf/bpf_helpers.h"
#define MAX_STACK_DEPTH 32
#define CANARY_VALUE 0xBADCAFE
/* Calculate size based on known layout:
* - fnode: sizeof(void*)
* - hash: 4 bytes
* - nr: 4 bytes
* - data: MAX_STACK_DEPTH * 8 bytes
* - canary: 8 bytes
*/
#define VALUE_SIZE (sizeof(void*) + 4 + 4 + (MAX_STACK_DEPTH * 8) + 8)
struct {
__uint(type, BPF_MAP_TYPE_STACK_TRACE);
__uint(max_entries, 1);
__uint(value_size, VALUE_SIZE);
__uint(key_size, sizeof(u32));
} stackmap SEC(".maps");
static __attribute__((noinline)) void recursive_helper(int depth) {
if (depth <= 0) return;
asm volatile("" ::: "memory");
recursive_helper(depth - 1);
}
SEC("kprobe/do_sys_open")
int test_stack_overflow(void *ctx) {
u32 key = 0;
u64 *stack = bpf_map_lookup_elem(&stackmap, &key);
if (!stack) return 0;
stack[MAX_STACK_DEPTH] = CANARY_VALUE;
/* Force minimum stack depth */
recursive_helper(MAX_STACK_DEPTH + 10);
(void)bpf_get_stackid(ctx, &stackmap, 0);
return 0;
}
char _license[] SEC("license") = "GPL";
On 01/08/2025 19:16, Lecomte, Arnaud wrote:
> Well, it turns out it is less straightforward than it looked like to
> detect the memory corruption
> without KASAN. I am currently in holidays for the next 3 days so I've
> limited access to a
> computer. I should be able to sort this out on monday.
>
> Thanks,
> Arnaud
>
> On 30/07/2025 08:10, Arnaud Lecomte wrote:
>> On 29/07/2025 23:45, Yonghong Song wrote:
>>>
>>>
>>> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>>>> Syzkaller reported a KASAN slab-out-of-bounds write in
>>>> __bpf_get_stackid()
>>>> when copying stack trace data. The issue occurs when the perf trace
>>>> contains more stack entries than the stack map bucket can hold,
>>>> leading to an out-of-bounds write in the bucket's data array.
>>>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>>>> to determine capacity, and for normal mode we use sizeof(u64).
>>>>
>>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>>>> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>>>
>>> Could you add a selftest? This way folks can easily find out what is
>>> the problem and why this fix solves the issue correctly.
>>>
>> Sure, will be done after work
>> Thanks,
>> Arnaud
>>>> ---
>>>> Changes in v2:
>>>> - Use utilty stack_map_data_size to compute map stack map size
>>>> ---
>>>> kernel/bpf/stackmap.c | 8 +++++++-
>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>>>> index 3615c06b7dfa..6f225d477f07 100644
>>>> --- a/kernel/bpf/stackmap.c
>>>> +++ b/kernel/bpf/stackmap.c
>>>> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
>>>> struct bpf_stack_map *smap = container_of(map, struct
>>>> bpf_stack_map, map);
>>>> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
>>>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>>>> - u32 hash, id, trace_nr, trace_len, i;
>>>> + u32 hash, id, trace_nr, trace_len, i, max_depth;
>>>> bool user = flags & BPF_F_USER_STACK;
>>>> u64 *ips;
>>>> bool hash_matches;
>>>> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map
>>>> *map,
>>>> trace_nr = trace->nr - skip;
>>>> trace_len = trace_nr * sizeof(u64);
>>>> +
>>>> + /* Clamp the trace to max allowed depth */
>>>> + max_depth = smap->map.value_size / stack_map_data_size(map);
>>>> + if (trace_nr > max_depth)
>>>> + trace_nr = max_depth;
>>>> +
>>>> ips = trace->ip + skip;
>>>> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
>>>> id = hash & (smap->n_buckets - 1);
>>>
>>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-05 20:49 ` Arnaud Lecomte
@ 2025-08-06 1:52 ` Yonghong Song
2025-08-07 17:50 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-06 1:52 UTC (permalink / raw)
To: Arnaud Lecomte, song, jolsa, ast, daniel, andrii, martin.lau,
eddyz87, john.fastabend, kpsingh, sdf, haoluo
Cc: bpf, linux-kernel, syzkaller-bugs, syzbot+c9b724fbb41cf2538b7b
On 8/5/25 1:49 PM, Arnaud Lecomte wrote:
> Hi,
> I gave it several tries and I can't find a nice to do see properly.
> The main challenge is to find a way to detect memory corruption. I
> wanted to place a canary value
> by tweaking the map size but we don't have a way from a BPF program
> perspective to access to the size
> of a stack_map_bucket. If we decide to do this computation manually,
> we would end-up with maintainability
> issues:
> #include "vmlinux.h"
> #include "bpf/bpf_helpers.h"
>
> #define MAX_STACK_DEPTH 32
> #define CANARY_VALUE 0xBADCAFE
>
> /* Calculate size based on known layout:
> * - fnode: sizeof(void*)
> * - hash: 4 bytes
> * - nr: 4 bytes
> * - data: MAX_STACK_DEPTH * 8 bytes
> * - canary: 8 bytes
> */
> #define VALUE_SIZE (sizeof(void*) + 4 + 4 + (MAX_STACK_DEPTH * 8) + 8)
>
> struct {
> __uint(type, BPF_MAP_TYPE_STACK_TRACE);
> __uint(max_entries, 1);
> __uint(value_size, VALUE_SIZE);
> __uint(key_size, sizeof(u32));
> } stackmap SEC(".maps");
>
> static __attribute__((noinline)) void recursive_helper(int depth) {
> if (depth <= 0) return;
> asm volatile("" ::: "memory");
> recursive_helper(depth - 1);
> }
>
> SEC("kprobe/do_sys_open")
> int test_stack_overflow(void *ctx) {
> u32 key = 0;
> u64 *stack = bpf_map_lookup_elem(&stackmap, &key);
> if (!stack) return 0;
>
> stack[MAX_STACK_DEPTH] = CANARY_VALUE;
>
> /* Force minimum stack depth */
> recursive_helper(MAX_STACK_DEPTH + 10);
>
> (void)bpf_get_stackid(ctx, &stackmap, 0);
> return 0;
> }
>
> char _license[] SEC("license") = "GPL";
It looks like it hard to trigger memory corruption inside the kernel.
Maybe kasan can detect it for your specific example.
If without selftests, you can do the following:
__bpf_get_stack() already solved the problem you tried to fix.
I suggest you refactor some portions of the code in __bpf_get_stack()
to set trace_nr properly, and then you can use that refactored function
in __bpf_get_stackid(). So two patches:
1. refactor portion of codes (related elem_size/trace_nr) in __bpf_get_stack().
2. fix the issue in __bpf_get_stackid() with newly created function.
>
> On 01/08/2025 19:16, Lecomte, Arnaud wrote:
>> Well, it turns out it is less straightforward than it looked like to
>> detect the memory corruption
>> without KASAN. I am currently in holidays for the next 3 days so
>> I've limited access to a
>> computer. I should be able to sort this out on monday.
>>
>> Thanks,
>> Arnaud
>>
>> On 30/07/2025 08:10, Arnaud Lecomte wrote:
>>> On 29/07/2025 23:45, Yonghong Song wrote:
>>>>
>>>>
>>>> On 7/29/25 9:56 AM, Arnaud Lecomte wrote:
>>>>> Syzkaller reported a KASAN slab-out-of-bounds write in
>>>>> __bpf_get_stackid()
>>>>> when copying stack trace data. The issue occurs when the perf trace
>>>>> contains more stack entries than the stack map bucket can hold,
>>>>> leading to an out-of-bounds write in the bucket's data array.
>>>>> For build_id mode, we use sizeof(struct bpf_stack_build_id)
>>>>> to determine capacity, and for normal mode we use sizeof(u64).
>>>>>
>>>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>>>>> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>>>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>>>>
>>>> Could you add a selftest? This way folks can easily find out what is
>>>> the problem and why this fix solves the issue correctly.
>>>>
>>> Sure, will be done after work
>>> Thanks,
>>> Arnaud
>>>>> ---
>>>>> Changes in v2:
>>>>> - Use utilty stack_map_data_size to compute map stack map size
>>>>> ---
>>>>> kernel/bpf/stackmap.c | 8 +++++++-
>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>>>>> index 3615c06b7dfa..6f225d477f07 100644
>>>>> --- a/kernel/bpf/stackmap.c
>>>>> +++ b/kernel/bpf/stackmap.c
>>>>> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map
>>>>> *map,
>>>>> struct bpf_stack_map *smap = container_of(map, struct
>>>>> bpf_stack_map, map);
>>>>> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
>>>>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>>>>> - u32 hash, id, trace_nr, trace_len, i;
>>>>> + u32 hash, id, trace_nr, trace_len, i, max_depth;
>>>>> bool user = flags & BPF_F_USER_STACK;
>>>>> u64 *ips;
>>>>> bool hash_matches;
>>>>> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map
>>>>> *map,
>>>>> trace_nr = trace->nr - skip;
>>>>> trace_len = trace_nr * sizeof(u64);
>>>>> +
>>>>> + /* Clamp the trace to max allowed depth */
>>>>> + max_depth = smap->map.value_size / stack_map_data_size(map);
>>>>> + if (trace_nr > max_depth)
>>>>> + trace_nr = max_depth;
>>>>> +
>>>>> ips = trace->ip + skip;
>>>>> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
>>>>> id = hash & (smap->n_buckets - 1);
>>>>
>>>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-06 1:52 ` Yonghong Song
@ 2025-08-07 17:50 ` Arnaud Lecomte
2025-08-07 17:52 ` [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-07 17:50 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..14e034045310 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored,
+ * or -EINVAL if size is not a multiple of elem_size
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)
+{
+ u32 max_depth;
+ u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;
+
+ if (unlikely(map_size%map_elem_size))
+ return -EINVAL;
+
+ max_depth = map_size / map_elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
- if (unlikely(size % elem_size))
- goto clear;
/* cannot get valid user stack for task without user_mode regs */
if (task && user && !user_mode(regs))
@@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}
- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
+ if (max_depth < 0)
+ goto err_fault;
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +483,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-07 17:50 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
@ 2025-08-07 17:52 ` Arnaud Lecomte
2025-08-07 19:05 ` Yonghong Song
2025-08-07 19:01 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-08 7:30 ` [syzbot ci] " syzbot ci
2 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-07 17:52 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 14e034045310..d7ef840971f0 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -250,7 +250,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}
static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -266,6 +266,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -325,19 +327,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;
- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}
const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;
/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
/* restore nr */
trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-
skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-07 17:50 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-07 17:52 ` [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-07 19:01 ` Yonghong Song
2025-08-07 19:07 ` Yonghong Song
2025-08-08 7:30 ` [syzbot ci] " syzbot ci
2 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-07 19:01 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
> 1 file changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..14e034045310 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @map_flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored,
> + * or -EINVAL if size is not a multiple of elem_size
-EINVAL is not needed here. See below.
> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 map_elem_size, u64 map_flags)
map_elem_size -> elem_size
> +{
> + u32 max_depth;
> + u32 skip = map_flags & BPF_F_SKIP_FIELD_MASK;
reverse Christmas tree?
> +
> + if (unlikely(map_size%map_elem_size))
> + return -EINVAL;
The above should not be here. The checking 'map_size % map_elem_size' is only needed
for bpf_get_stack(), not applicable for bpf_get_stackid().
> +
> + max_depth = map_size / map_elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +431,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -423,8 +448,6 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
>
> elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
> - if (unlikely(size % elem_size))
> - goto clear;
Please keep this one.
>
> /* cannot get valid user stack for task without user_mode regs */
> if (task && user && !user_mode(regs))
> @@ -438,10 +461,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
> + if (max_depth < 0)
> + goto err_fault;
max_depth is never less than 0.
>
> if (may_fault)
> rcu_read_lock(); /* need RCU for perf's callchain below */
> @@ -461,7 +483,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> }
>
> trace_nr = trace->nr - skip;
> - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> + trace_nr = min(trace_nr, max_depth - skip);
> copy_len = trace_nr * elem_size;
>
> ips = trace->ip + skip;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-07 17:52 ` [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-07 19:05 ` Yonghong Song
0 siblings, 0 replies; 28+ messages in thread
From: Yonghong Song @ 2025-08-07 19:05 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/7/25 10:52 AM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 14e034045310..d7ef840971f0 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -250,7 +250,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
> }
>
> static long __bpf_get_stackid(struct bpf_map *map,
> - struct perf_callchain_entry *trace, u64 flags)
> + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> {
> struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> @@ -266,6 +266,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
>
> trace_nr = trace->nr - skip;
> trace_len = trace_nr * sizeof(u64);
> + trace_nr = min(trace_nr, max_depth - skip);
> +
> ips = trace->ip + skip;
> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
> id = hash & (smap->n_buckets - 1);
> @@ -325,19 +327,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
> BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> u64, flags)
> {
> - u32 max_depth = map->value_size / stack_map_data_size(map);
> - u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 elem_size = stack_map_data_size(map);
> bool user = flags & BPF_F_USER_STACK;
> struct perf_callchain_entry *trace;
> bool kernel = !user;
> + u32 max_depth;
>
> if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
> return -EINVAL;
>
> - max_depth += skip;
> - if (max_depth > sysctl_perf_event_max_stack)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + if (max_depth < 0)
> + return -EFAULT;
the above condition is not needed.
>
> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> false, false);
> @@ -346,7 +348,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> /* couldn't fetch the stack trace */
> return -EFAULT;
>
> - return __bpf_get_stackid(map, trace, flags);
> + return __bpf_get_stackid(map, trace, flags, max_depth);
> }
>
> const struct bpf_func_proto bpf_get_stackid_proto = {
> @@ -378,6 +380,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> + u32 elem_size, pe_max_depth;
pe_max_depth -> max_depth.
>
> /* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
> @@ -396,24 +399,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> -
> + elem_size = stack_map_data_size(map);
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;
> - ret = __bpf_get_stackid(map, trace, flags);
> + pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
>
> /* restore nr */
> trace->nr = nr;
> } else { /* user */
> u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
> -
please keep an empty line here.
> skip += nr_kernel;
> if (skip > BPF_F_SKIP_FIELD_MASK)
> return -EFAULT;
>
> flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> - ret = __bpf_get_stackid(map, trace, flags);
> + pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
> }
> return ret;
> }
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-07 19:01 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
@ 2025-08-07 19:07 ` Yonghong Song
2025-08-09 11:56 ` [PATCH v2 " Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-07 19:07 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/7/25 12:01 PM, Yonghong Song wrote:
>
>
> On 8/7/25 10:50 AM, Arnaud Lecomte wrote:
>> A new helper function stack_map_calculate_max_depth() that
>> computes the max depth for a stackmap.
>>
>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>> ---
>> kernel/bpf/stackmap.c | 38 ++++++++++++++++++++++++++++++--------
>> 1 file changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 3615c06b7dfa..14e034045310 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -42,6 +42,31 @@ static inline int stack_map_data_size(struct
>> bpf_map *map)
>> sizeof(struct bpf_stack_build_id) : sizeof(u64);
>> }
>> +/**
>> + * stack_map_calculate_max_depth - Calculate maximum allowed stack
>> trace depth
>> + * @map_size: Size of the buffer/map value in bytes
>> + * @elem_size: Size of each stack trace element
>> + * @map_flags: BPF stack trace flags (BPF_F_USER_STACK,
>> BPF_F_USER_BUILD_ID, ...)
One more thing: map_flags -> flags, as 'flags is used in bpf_get_stackid/bpf_get_stack etc.
>> + *
>> + * Return: Maximum number of stack trace entries that can be safely
>> stored,
>> + * or -EINVAL if size is not a multiple of elem_size
>
> -EINVAL is not needed here. See below.
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [syzbot ci] Re: bpf: refactor max_depth computation in bpf_get_stack()
2025-08-07 17:50 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-07 17:52 ` [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-07 19:01 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
@ 2025-08-08 7:30 ` syzbot ci
2 siblings, 0 replies; 28+ messages in thread
From: syzbot ci @ 2025-08-08 7:30 UTC (permalink / raw)
To: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot, syzkaller-bugs, yonghong.song
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] bpf: refactor max_depth computation in bpf_get_stack()
https://lore.kernel.org/all/20250807175032.7381-1-contact@arnaud-lcm.com
* [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack()
* [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
and found the following issues:
* KASAN: stack-out-of-bounds Write in __bpf_get_stack
* PANIC: double fault in its_return_thunk
Full report is available here:
https://ci.syzbot.org/series/2af1b227-99e3-4e64-ac23-827848a4b8a5
***
KASAN: stack-out-of-bounds Write in __bpf_get_stack
tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/c_repro
syz repro: https://ci.syzbot.org/findings/1355d710-d133-43fd-9061-18b2de6844a4/syz_repro
netdevsim netdevsim1 netdevsim0: renamed from eth0
netdevsim netdevsim1 netdevsim1: renamed from eth1
==================================================================
BUG: KASAN: stack-out-of-bounds in __bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
Write of size 208 at addr ffffc90003655ee8 by task syz-executor/5952
CPU: 1 UID: 0 PID: 5952 Comm: syz-executor Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:189
__asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
__bpf_get_stack+0x54a/0xa70 kernel/bpf/stackmap.c:501
____bpf_get_stack kernel/bpf/stackmap.c:525 [inline]
bpf_get_stack+0x33/0x50 kernel/bpf/stackmap.c:522
____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1835 [inline]
bpf_get_stack_raw_tp+0x1a9/0x220 kernel/trace/bpf_trace.c:1825
bpf_prog_4e330ebee64cb698+0x43/0x4b
bpf_dispatcher_nop_func include/linux/bpf.h:1332 [inline]
__bpf_prog_run include/linux/filter.h:718 [inline]
bpf_prog_run include/linux/filter.h:725 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2257 [inline]
bpf_trace_run10+0x2e4/0x500 kernel/trace/bpf_trace.c:2306
__bpf_trace_percpu_alloc_percpu+0x364/0x400 include/trace/events/percpu.h:11
__do_trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
trace_percpu_alloc_percpu include/trace/events/percpu.h:11 [inline]
pcpu_alloc_noprof+0x1534/0x16b0 mm/percpu.c:1892
fib_nh_common_init+0x9c/0x3b0 net/ipv4/fib_semantics.c:620
fib6_nh_init+0x1608/0x1ff0 net/ipv6/route.c:3671
ip6_route_info_create_nh+0x16a/0xab0 net/ipv6/route.c:3892
ip6_route_add+0x6e/0x1b0 net/ipv6/route.c:3944
addrconf_add_mroute net/ipv6/addrconf.c:2552 [inline]
addrconf_add_dev+0x24f/0x340 net/ipv6/addrconf.c:2570
addrconf_dev_config net/ipv6/addrconf.c:3479 [inline]
addrconf_init_auto_addrs+0x57c/0xa30 net/ipv6/addrconf.c:3567
addrconf_notify+0xacc/0x1010 net/ipv6/addrconf.c:3740
notifier_call_chain+0x1b6/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2267 [inline]
call_netdevice_notifiers net/core/dev.c:2281 [inline]
__dev_notify_flags+0x18d/0x2e0 net/core/dev.c:-1
netif_change_flags+0xe8/0x1a0 net/core/dev.c:9608
do_setlink+0xc55/0x41c0 net/core/rtnetlink.c:3143
rtnl_changelink net/core/rtnetlink.c:3761 [inline]
__rtnl_newlink net/core/rtnetlink.c:3920 [inline]
rtnl_newlink+0x160b/0x1c70 net/core/rtnetlink.c:4057
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6946
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2552
netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1346
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:729
__sys_sendto+0x3bd/0x520 net/socket.c:2228
__do_sys_sendto net/socket.c:2235 [inline]
__se_sys_sendto net/socket.c:2231 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2231
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fec5c790a7c
Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
RSP: 002b:00007fff7b55f7b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fec5d4e35c0 RCX: 00007fec5c790a7c
RDX: 0000000000000030 RSI: 00007fec5d4e3610 RDI: 0000000000000006
RBP: 0000000000000000 R08: 00007fff7b55f804 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000006
R13: 0000000000000000 R14: 00007fec5d4e3610 R15: 0000000000000000
</TASK>
The buggy address belongs to stack of task syz-executor/5952
and is located at offset 296 in frame:
__bpf_get_stack+0x0/0xa70 include/linux/mmap_lock.h:-1
This frame has 1 object:
[32, 36) 'rctx.i'
The buggy address belongs to a 8-page vmalloc region starting at 0xffffc90003650000 allocated at copy_process+0x54b/0x3c00 kernel/fork.c:2002
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888024c63200 pfn:0x24c62
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
raw: ffff888024c63200 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO|__GFP_NOWARN), pid 5845, tgid 5845 (syz-executor), ts 59049058263, free_ts 59031992240
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1851
prep_new_page mm/page_alloc.c:1859 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3858
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5148
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_frozen_pages_noprof mm/mempolicy.c:2487 [inline]
alloc_pages_noprof+0xa9/0x190 mm/mempolicy.c:2507
vm_area_alloc_pages mm/vmalloc.c:3642 [inline]
__vmalloc_area_node mm/vmalloc.c:3720 [inline]
__vmalloc_node_range_noprof+0x97d/0x12f0 mm/vmalloc.c:3893
__vmalloc_node_noprof+0xc2/0x110 mm/vmalloc.c:3956
alloc_thread_stack_node kernel/fork.c:318 [inline]
dup_task_struct+0x3e7/0x860 kernel/fork.c:879
copy_process+0x54b/0x3c00 kernel/fork.c:2002
kernel_clone+0x21e/0x840 kernel/fork.c:2603
__do_sys_clone3 kernel/fork.c:2907 [inline]
__se_sys_clone3+0x256/0x2d0 kernel/fork.c:2886
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5907 tgid 5907 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1395 [inline]
__free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2895
vfree+0x25a/0x400 mm/vmalloc.c:3434
kcov_put kernel/kcov.c:439 [inline]
kcov_close+0x28/0x50 kernel/kcov.c:535
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x6b5/0x2300 kernel/exit.c:966
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
get_signal+0x1286/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x750 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x75/0x110 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffffc90003655e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90003655e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc90003655f00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f2 f2
^
ffffc90003655f80: 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3
ffffc90003656000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
***
PANIC: double fault in its_return_thunk
tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: f3af62b6cee8af9f07012051874af2d2a451f0e5
arch: amd64
compiler: Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
config: https://ci.syzbot.org/builds/5e5c6698-7b84-4bf2-a1ee-1b6223c8d4c3/config
C repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/c_repro
syz repro: https://ci.syzbot.org/findings/1bf5dce6-467f-4bcd-9357-2726101d2ad1/syz_repro
traps: PANIC: double fault, error_code: 0x0
Oops: double fault: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 5789 Comm: syz-executor930 Not tainted 6.16.0-syzkaller-11113-gf3af62b6cee8-dirty #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
Call Trace:
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:its_return_thunk+0x0/0x10 arch/x86/lib/retpoline.S:412
Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <c3> cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e9 6b 2b b9 f5 cc
RSP: 0018:ffffffffa0000877 EFLAGS: 00010246
RAX: 2161df6de464b300 RBX: 4800be48c0315641 RCX: 2161df6de464b300
RDX: 0000000000000000 RSI: ffffffff8dba01ee RDI: ffff888105cc9cc0
RBP: eb7a3aa9e9c95e41 R08: ffffffff81000130 R09: ffffffff81000130
R10: ffffffff81d017ac R11: ffffffff8b7707da R12: 3145ffff888028c3
R13: ee8948f875894cf6 R14: 000002baf8c68348 R15: e1cb3861e8c93100
FS: 0000555557cbc380(0000) GS:ffff8880b862a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0000868 CR3: 0000000028468000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: cc int3
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: cc int3
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29: cc int3
* 2a: c3 ret <-- trapping instruction
2b: cc int3
2c: 90 nop
2d: 90 nop
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 90 nop
33: 90 nop
34: 90 nop
35: 90 nop
36: 90 nop
37: 90 nop
38: 90 nop
39: 90 nop
3a: e9 6b 2b b9 f5 jmp 0xf5b92baa
3f: cc int3
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-07 19:07 ` Yonghong Song
@ 2025-08-09 11:56 ` Arnaud Lecomte
2025-08-09 11:58 ` [PATCH v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-09 11:56 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = map_size / elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}
- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-09 11:56 ` [PATCH v2 " Arnaud Lecomte
@ 2025-08-09 11:58 ` Arnaud Lecomte
2025-08-09 12:09 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-09 11:58 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..30c4f7f2ccd1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}
static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ if (max_depth < 0)
+ return -EFAULT;
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +344,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;
- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}
const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +376,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, pe_max_depth;
/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,24 +395,25 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
/* restore nr */
trace->nr = nr;
} else { /* user */
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
-
skip += nr_kernel;
if (skip > BPF_F_SKIP_FIELD_MASK)
return -EFAULT;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ pe_max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, pe_max_depth);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-09 11:58 ` [PATCH v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-09 12:09 ` Arnaud Lecomte
2025-08-09 12:14 ` [PATCH RESEND v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-12 4:39 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
0 siblings, 2 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-09 12:09 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, Arnaud Lecomte
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.
Changes in v2:
- Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..532447606532 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @map_size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = map_size / elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}
- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH RESEND v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-09 12:09 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
@ 2025-08-09 12:14 ` Arnaud Lecomte
2025-08-12 4:39 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
1 sibling, 0 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-09 12:14 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Changes in v2:
- Fixed max_depth names across get stack id
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 532447606532..b3995724776c 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}
static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;
- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}
const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;
/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,16 +393,18 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
/* restore nr */
trace->nr = nr;
} else { /* user */
+
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
skip += nr_kernel;
@@ -409,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-09 12:09 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-09 12:14 ` [PATCH RESEND v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-12 4:39 ` Yonghong Song
2025-08-12 19:30 ` [PATCH bpf-next v3 " Arnaud Lecomte
2025-08-12 19:32 ` [PATCH RESEND v2 " Arnaud Lecomte
1 sibling, 2 replies; 28+ messages in thread
From: Yonghong Song @ 2025-08-12 4:39 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/9/25 5:09 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
Please add 'bpf-next' in the subject like [PATCH bpf-next v2 1/2]
so CI can properly test the patch set.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..532447606532 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @map_size: Size of the buffer/map value in bytes
let us rename 'map_size' to 'size' since the size represents size of
buffer or map, not just for map.
> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 map_size, u32 elem_size, u64 flags)
map_size -> size
Also, you can replace 'flags' to 'skip', so below 'u32 skip = flags & BPF_F_SKIP_FIELD_MASK'
is not necessary.
> +{
> + u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 max_depth;
> +
> + max_depth = map_size / elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
> if (may_fault)
> rcu_read_lock(); /* need RCU for perf's callchain below */
> @@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> }
>
> trace_nr = trace->nr - skip;
> - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> + trace_nr = min(trace_nr, max_depth - skip);
> copy_len = trace_nr * elem_size;
>
> ips = trace->ip + skip;
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH bpf-next v3 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-12 4:39 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
@ 2025-08-12 19:30 ` Arnaud Lecomte
2025-08-12 19:32 ` [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-13 5:54 ` [PATCH bpf-next v3 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-12 19:32 ` [PATCH RESEND v2 " Arnaud Lecomte
1 sibling, 2 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-12 19:30 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.
Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic
Changes in v3:
- Changed map size param to size in max depth helper
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..a267567e36dd 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = size / elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}
- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-12 4:39 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-12 19:30 ` [PATCH bpf-next v3 " Arnaud Lecomte
@ 2025-08-12 19:32 ` Arnaud Lecomte
1 sibling, 0 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-12 19:32 UTC (permalink / raw)
To: Yonghong Song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
Thanks Yonghong for your feedbacks and your patience !
On 12/08/2025 05:39, Yonghong Song wrote:
>
>
> On 8/9/25 5:09 AM, Arnaud Lecomte wrote:
>> A new helper function stack_map_calculate_max_depth() that
>> computes the max depth for a stackmap.
>
> Please add 'bpf-next' in the subject like [PATCH bpf-next v2 1/2]
> so CI can properly test the patch set.
>
>>
>> Changes in v2:
>> - Removed the checking 'map_size % map_elem_size' from
>> stack_map_calculate_max_depth
>> - Changed stack_map_calculate_max_depth params name to be more generic
>>
>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>> ---
>> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
>> 1 file changed, 24 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 3615c06b7dfa..532447606532 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct
>> bpf_map *map)
>> sizeof(struct bpf_stack_build_id) : sizeof(u64);
>> }
>> +/**
>> + * stack_map_calculate_max_depth - Calculate maximum allowed stack
>> trace depth
>> + * @map_size: Size of the buffer/map value in bytes
>
> let us rename 'map_size' to 'size' since the size represents size of
> buffer or map, not just for map.
>
>> + * @elem_size: Size of each stack trace element
>> + * @flags: BPF stack trace flags (BPF_F_USER_STACK,
>> BPF_F_USER_BUILD_ID, ...)
>> + *
>> + * Return: Maximum number of stack trace entries that can be safely
>> stored
>> + */
>> +static u32 stack_map_calculate_max_depth(u32 map_size, u32
>> elem_size, u64 flags)
>
> map_size -> size
> Also, you can replace 'flags' to 'skip', so below 'u32 skip = flags &
> BPF_F_SKIP_FIELD_MASK'
> is not necessary.
>
>> +{
>> + u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> + u32 max_depth;
>> +
>> + max_depth = map_size / elem_size;
>> + max_depth += skip;
>> + if (max_depth > sysctl_perf_event_max_stack)
>> + return sysctl_perf_event_max_stack;
>> +
>> + return max_depth;
>> +}
>> +
>> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
>> {
>> u64 elem_size = sizeof(struct stack_map_bucket) +
>> @@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
>> struct task_struct *task,
>> struct perf_callchain_entry *trace_in,
>> void *buf, u32 size, u64 flags, bool may_fault)
>> {
>> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
>> + u32 trace_nr, copy_len, elem_size, max_depth;
>> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
>> bool crosstask = task && task != current;
>> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> @@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs
>> *regs, struct task_struct *task,
>> goto clear;
>> }
>> - num_elem = size / elem_size;
>> - max_depth = num_elem + skip;
>> - if (sysctl_perf_event_max_stack < max_depth)
>> - max_depth = sysctl_perf_event_max_stack;
>> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>> if (may_fault)
>> rcu_read_lock(); /* need RCU for perf's callchain below */
>> @@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs,
>> struct task_struct *task,
>> }
>> trace_nr = trace->nr - skip;
>> - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
>> + trace_nr = min(trace_nr, max_depth - skip);
>> copy_len = trace_nr * elem_size;
>> ips = trace->ip + skip;
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-12 19:30 ` [PATCH bpf-next v3 " Arnaud Lecomte
@ 2025-08-12 19:32 ` Arnaud Lecomte
2025-08-13 5:59 ` Yonghong Song
2025-08-13 5:54 ` [PATCH bpf-next v3 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
1 sibling, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-12 19:32 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, Arnaud Lecomte
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Changes in v2:
- Fixed max_depth names across get stack id
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a267567e36dd..e1ee18cbbbb2 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}
static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;
- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}
const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;
/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,16 +393,18 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
/* restore nr */
trace->nr = nr;
} else { /* user */
+
u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
skip += nr_kernel;
@@ -409,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v3 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-12 19:30 ` [PATCH bpf-next v3 " Arnaud Lecomte
2025-08-12 19:32 ` [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-13 5:54 ` Yonghong Song
1 sibling, 0 replies; 28+ messages in thread
From: Yonghong Song @ 2025-08-13 5:54 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/12/25 12:30 PM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
LGTM with a small nit below.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..a267567e36dd 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
Let us have consistent format, e.g.
* @size: Size of ...
* @elem_size: Size of ...
* @flags: BPF stack trace ...
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
> +{
> + u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 max_depth;
> +
> + max_depth = size / elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
> if (may_fault)
> rcu_read_lock(); /* need RCU for perf's callchain below */
> @@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> }
>
> trace_nr = trace->nr - skip;
> - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> + trace_nr = min(trace_nr, max_depth - skip);
> copy_len = trace_nr * elem_size;
>
> ips = trace->ip + skip;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-12 19:32 ` [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-13 5:59 ` Yonghong Song
2025-08-13 20:46 ` [PATCH bpf-next v4 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-13 5:59 UTC (permalink / raw)
To: Arnaud Lecomte
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/12/25 12:32 PM, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Changes in v2:
> - Fixed max_depth names across get stack id
>
> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
LGTM with a few nits below.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> kernel/bpf/stackmap.c | 24 ++++++++++++++----------
> 1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index a267567e36dd..e1ee18cbbbb2 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
> }
>
> static long __bpf_get_stackid(struct bpf_map *map,
> - struct perf_callchain_entry *trace, u64 flags)
> + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> {
> struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> @@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
>
> trace_nr = trace->nr - skip;
> trace_len = trace_nr * sizeof(u64);
> + trace_nr = min(trace_nr, max_depth - skip);
> +
> ips = trace->ip + skip;
> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
> id = hash & (smap->n_buckets - 1);
> @@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
> BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> u64, flags)
> {
> - u32 max_depth = map->value_size / stack_map_data_size(map);
> - u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 elem_size = stack_map_data_size(map);
> bool user = flags & BPF_F_USER_STACK;
> struct perf_callchain_entry *trace;
> bool kernel = !user;
> + u32 max_depth;
>
> if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
> return -EINVAL;
>
> - max_depth += skip;
> - if (max_depth > sysctl_perf_event_max_stack)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
>
> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> false, false);
> @@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> /* couldn't fetch the stack trace */
> return -EFAULT;
>
> - return __bpf_get_stackid(map, trace, flags);
> + return __bpf_get_stackid(map, trace, flags, max_depth);
> }
>
> const struct bpf_func_proto bpf_get_stackid_proto = {
> @@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> + u32 elem_size, max_depth;
>
> /* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
> @@ -392,16 +393,18 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> -
> + elem_size = stack_map_data_size(map);
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;
> - ret = __bpf_get_stackid(map, trace, flags);
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, max_depth);
>
> /* restore nr */
> trace->nr = nr;
> } else { /* user */
> +
Remove the above empty line.
> u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
>
> skip += nr_kernel;
> @@ -409,7 +412,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> - ret = __bpf_get_stackid(map, trace, flags);
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, max_depth);
> }
> return ret;
> }
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH bpf-next v4 1/2] bpf: refactor max_depth computation in bpf_get_stack()
2025-08-13 5:59 ` Yonghong Song
@ 2025-08-13 20:46 ` Arnaud Lecomte
2025-08-13 20:55 ` [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-13 20:46 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
A new helper function stack_map_calculate_max_depth() that
computes the max depth for a stackmap.
Changes in v2:
- Removed the checking 'map_size % map_elem_size' from
stack_map_calculate_max_depth
- Changed stack_map_calculate_max_depth params name to be more generic
Changes in v3:
- Changed map size param to size in max depth helper
Changes in v4:
- Fixed indentation in max depth helper for args
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 3615c06b7dfa..b9cc6c72a2a5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
sizeof(struct bpf_stack_build_id) : sizeof(u64);
}
+/**
+ * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
+ * @size: Size of the buffer/map value in bytes
+ * @elem_size: Size of each stack trace element
+ * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
+ *
+ * Return: Maximum number of stack trace entries that can be safely stored
+ */
+static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
+{
+ u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 max_depth;
+
+ max_depth = size / elem_size;
+ max_depth += skip;
+ if (max_depth > sysctl_perf_event_max_stack)
+ return sysctl_perf_event_max_stack;
+
+ return max_depth;
+}
+
static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
{
u64 elem_size = sizeof(struct stack_map_bucket) +
@@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
void *buf, u32 size, u64 flags, bool may_fault)
{
- u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
+ u32 trace_nr, copy_len, elem_size, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
bool crosstask = task && task != current;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
@@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
goto clear;
}
- num_elem = size / elem_size;
- max_depth = num_elem + skip;
- if (sysctl_perf_event_max_stack < max_depth)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
if (may_fault)
rcu_read_lock(); /* need RCU for perf's callchain below */
@@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
}
trace_nr = trace->nr - skip;
- trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+ trace_nr = min(trace_nr, max_depth - skip);
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-13 20:46 ` [PATCH bpf-next v4 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
@ 2025-08-13 20:55 ` Arnaud Lecomte
2025-08-18 13:49 ` Lecomte, Arnaud
0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-13 20:55 UTC (permalink / raw)
To: yonghong.song
Cc: andrii, ast, bpf, contact, daniel, eddyz87, haoluo,
john.fastabend, jolsa, kpsingh, linux-kernel, martin.lau, sdf,
song, syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Changes in v2:
- Fixed max_depth names across get stack id
Changes in v4:
- Removed unnecessary empty line in __bpf_get_stackid
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
---
kernel/bpf/stackmap.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b9cc6c72a2a5..318f150460bb 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
}
static long __bpf_get_stackid(struct bpf_map *map,
- struct perf_callchain_entry *trace, u64 flags)
+ struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
{
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
@@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
trace_nr = trace->nr - skip;
trace_len = trace_nr * sizeof(u64);
+ trace_nr = min(trace_nr, max_depth - skip);
+
ips = trace->ip + skip;
hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
id = hash & (smap->n_buckets - 1);
@@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
- u32 max_depth = map->value_size / stack_map_data_size(map);
- u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+ u32 elem_size = stack_map_data_size(map);
bool user = flags & BPF_F_USER_STACK;
struct perf_callchain_entry *trace;
bool kernel = !user;
+ u32 max_depth;
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
return -EINVAL;
- max_depth += skip;
- if (max_depth > sysctl_perf_event_max_stack)
- max_depth = sysctl_perf_event_max_stack;
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
false, false);
@@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
/* couldn't fetch the stack trace */
return -EFAULT;
- return __bpf_get_stackid(map, trace, flags);
+ return __bpf_get_stackid(map, trace, flags, max_depth);
}
const struct bpf_func_proto bpf_get_stackid_proto = {
@@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
bool kernel, user;
__u64 nr_kernel;
int ret;
+ u32 elem_size, max_depth;
/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
@@ -392,12 +393,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
nr_kernel = count_kernel_ip(trace);
-
+ elem_size = stack_map_data_size(map);
if (kernel) {
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
/* restore nr */
trace->nr = nr;
@@ -409,7 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
return -EFAULT;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- ret = __bpf_get_stackid(map, trace, flags);
+ max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
+ ret = __bpf_get_stackid(map, trace, flags, max_depth);
}
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-13 20:55 ` [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
@ 2025-08-18 13:49 ` Lecomte, Arnaud
2025-08-18 16:57 ` Yonghong Song
0 siblings, 1 reply; 28+ messages in thread
From: Lecomte, Arnaud @ 2025-08-18 13:49 UTC (permalink / raw)
To: song, jolsa
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend, jolsa,
kpsingh, linux-kernel, martin.lau, sdf, song,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs, yonghong.song
Hey,
Just forwarding the patch to the associated maintainers with `stackmap.c`.
Have a great day,
Cheers
On 13/08/2025 21:55, Arnaud Lecomte wrote:
> Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
> when copying stack trace data. The issue occurs when the perf trace
> contains more stack entries than the stack map bucket can hold,
> leading to an out-of-bounds write in the bucket's data array.
>
> Changes in v2:
> - Fixed max_depth names across get stack id
>
> Changes in v4:
> - Removed unnecessary empty line in __bpf_get_stackid
>
> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
> ---
> kernel/bpf/stackmap.c | 23 +++++++++++++----------
> 1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index b9cc6c72a2a5..318f150460bb 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -246,7 +246,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
> }
>
> static long __bpf_get_stackid(struct bpf_map *map,
> - struct perf_callchain_entry *trace, u64 flags)
> + struct perf_callchain_entry *trace, u64 flags, u32 max_depth)
> {
> struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
> struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
> @@ -262,6 +262,8 @@ static long __bpf_get_stackid(struct bpf_map *map,
>
> trace_nr = trace->nr - skip;
> trace_len = trace_nr * sizeof(u64);
> + trace_nr = min(trace_nr, max_depth - skip);
> +
> ips = trace->ip + skip;
> hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
> id = hash & (smap->n_buckets - 1);
> @@ -321,19 +323,17 @@ static long __bpf_get_stackid(struct bpf_map *map,
> BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> u64, flags)
> {
> - u32 max_depth = map->value_size / stack_map_data_size(map);
> - u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 elem_size = stack_map_data_size(map);
> bool user = flags & BPF_F_USER_STACK;
> struct perf_callchain_entry *trace;
> bool kernel = !user;
> + u32 max_depth;
>
> if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
> return -EINVAL;
>
> - max_depth += skip;
> - if (max_depth > sysctl_perf_event_max_stack)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
>
> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> false, false);
> @@ -342,7 +342,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> /* couldn't fetch the stack trace */
> return -EFAULT;
>
> - return __bpf_get_stackid(map, trace, flags);
> + return __bpf_get_stackid(map, trace, flags, max_depth);
> }
>
> const struct bpf_func_proto bpf_get_stackid_proto = {
> @@ -374,6 +374,7 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> bool kernel, user;
> __u64 nr_kernel;
> int ret;
> + u32 elem_size, max_depth;
>
> /* perf_sample_data doesn't have callchain, use bpf_get_stackid */
> if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
> @@ -392,12 +393,13 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> nr_kernel = count_kernel_ip(trace);
> -
> + elem_size = stack_map_data_size(map);
> if (kernel) {
> __u64 nr = trace->nr;
>
> trace->nr = nr_kernel;
> - ret = __bpf_get_stackid(map, trace, flags);
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, max_depth);
>
> /* restore nr */
> trace->nr = nr;
> @@ -409,7 +411,8 @@ BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
> return -EFAULT;
>
> flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
> - ret = __bpf_get_stackid(map, trace, flags);
> + max_depth = stack_map_calculate_max_depth(map->value_size, elem_size, flags);
> + ret = __bpf_get_stackid(map, trace, flags, max_depth);
> }
> return ret;
> }
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-18 13:49 ` Lecomte, Arnaud
@ 2025-08-18 16:57 ` Yonghong Song
2025-08-18 17:02 ` Yonghong Song
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-18 16:57 UTC (permalink / raw)
To: Lecomte, Arnaud, song, jolsa
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend,
kpsingh, linux-kernel, martin.lau, sdf,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
> Hey,
> Just forwarding the patch to the associated maintainers with
> `stackmap.c`.
Arnaud, please add Ack (provided in comments for v3) to make things easier
for maintainers.
Also, looks like all your patch sets (v1 to v4) in the same thread.
It would be good to have all these versions in separate thread.
Please look at some examples in bpf mailing list.
> Have a great day,
> Cheers
>
> On 13/08/2025 21:55, Arnaud Lecomte wrote:
>> Syzkaller reported a KASAN slab-out-of-bounds write in
>> __bpf_get_stackid()
>> when copying stack trace data. The issue occurs when the perf trace
>> contains more stack entries than the stack map bucket can hold,
>> leading to an out-of-bounds write in the bucket's data array.
>>
>> Changes in v2:
>> - Fixed max_depth names across get stack id
>>
>> Changes in v4:
>> - Removed unnecessary empty line in __bpf_get_stackid
>>
>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>> ---
>> kernel/bpf/stackmap.c | 23 +++++++++++++----------
>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-18 16:57 ` Yonghong Song
@ 2025-08-18 17:02 ` Yonghong Song
2025-08-19 16:20 ` Arnaud Lecomte
0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2025-08-18 17:02 UTC (permalink / raw)
To: Lecomte, Arnaud, song, jolsa
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend,
kpsingh, linux-kernel, martin.lau, sdf,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 8/18/25 9:57 AM, Yonghong Song wrote:
>
>
> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>> Hey,
>> Just forwarding the patch to the associated maintainers with
>> `stackmap.c`.
>
> Arnaud, please add Ack (provided in comments for v3) to make things
> easier
> for maintainers.
>
> Also, looks like all your patch sets (v1 to v4) in the same thread.
sorry, it should be v3 and v4 in the same thread.
> It would be good to have all these versions in separate thread.
> Please look at some examples in bpf mailing list.
>
>> Have a great day,
>> Cheers
>>
>> On 13/08/2025 21:55, Arnaud Lecomte wrote:
>>> Syzkaller reported a KASAN slab-out-of-bounds write in
>>> __bpf_get_stackid()
>>> when copying stack trace data. The issue occurs when the perf trace
>>> contains more stack entries than the stack map bucket can hold,
>>> leading to an out-of-bounds write in the bucket's data array.
>>>
>>> Changes in v2:
>>> - Fixed max_depth names across get stack id
>>>
>>> Changes in v4:
>>> - Removed unnecessary empty line in __bpf_get_stackid
>>>
>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>>> ---
>>> kernel/bpf/stackmap.c | 23 +++++++++++++----------
>>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>>
> [...]
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid()
2025-08-18 17:02 ` Yonghong Song
@ 2025-08-19 16:20 ` Arnaud Lecomte
0 siblings, 0 replies; 28+ messages in thread
From: Arnaud Lecomte @ 2025-08-19 16:20 UTC (permalink / raw)
To: Yonghong Song, song, jolsa
Cc: andrii, ast, bpf, daniel, eddyz87, haoluo, john.fastabend,
kpsingh, linux-kernel, martin.lau, sdf,
syzbot+c9b724fbb41cf2538b7b, syzkaller-bugs
On 18/08/2025 18:02, Yonghong Song wrote:
>
>
> On 8/18/25 9:57 AM, Yonghong Song wrote:
>>
>>
>> On 8/18/25 6:49 AM, Lecomte, Arnaud wrote:
>>> Hey,
>>> Just forwarding the patch to the associated maintainers with
>>> `stackmap.c`.
>>
>> Arnaud, please add Ack (provided in comments for v3) to make things
>> easier
>> for maintainers.
>>
>> Also, looks like all your patch sets (v1 to v4) in the same thread.
>
> sorry, it should be v3 and v4 in the same thread.
>
Hey, ty for the feedback !
I am going to provide the link to the v3 in the v4 commit and resent the
v4 with the Acked-by.
>> It would be good to have all these versions in separate thread.
>> Please look at some examples in bpf mailing list.
>>
>>> Have a great day,
>>> Cheers
>>>
>>> On 13/08/2025 21:55, Arnaud Lecomte wrote:
>>>> Syzkaller reported a KASAN slab-out-of-bounds write in
>>>> __bpf_get_stackid()
>>>> when copying stack trace data. The issue occurs when the perf trace
>>>> contains more stack entries than the stack map bucket can hold,
>>>> leading to an out-of-bounds write in the bucket's data array.
>>>>
>>>> Changes in v2:
>>>> - Fixed max_depth names across get stack id
>>>>
>>>> Changes in v4:
>>>> - Removed unnecessary empty line in __bpf_get_stackid
>>>>
>>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
>>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
>>>> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
>>>> ---
>>>> kernel/bpf/stackmap.c | 23 +++++++++++++----------
>>>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>>>
>> [...]
>>
>>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-08-19 16:21 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 16:56 [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-07-29 22:45 ` Yonghong Song
2025-07-30 7:10 ` Arnaud Lecomte
2025-08-01 18:16 ` Lecomte, Arnaud
2025-08-05 20:49 ` Arnaud Lecomte
2025-08-06 1:52 ` Yonghong Song
2025-08-07 17:50 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-07 17:52 ` [PATCH 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-07 19:05 ` Yonghong Song
2025-08-07 19:01 ` [PATCH 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-07 19:07 ` Yonghong Song
2025-08-09 11:56 ` [PATCH v2 " Arnaud Lecomte
2025-08-09 11:58 ` [PATCH v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-09 12:09 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-09 12:14 ` [PATCH RESEND v2 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-12 4:39 ` [PATCH RESEND v2 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-12 19:30 ` [PATCH bpf-next v3 " Arnaud Lecomte
2025-08-12 19:32 ` [PATCH bpf-next v3 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-13 5:59 ` Yonghong Song
2025-08-13 20:46 ` [PATCH bpf-next v4 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-13 20:55 ` [PATCH bpf-next v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-18 13:49 ` Lecomte, Arnaud
2025-08-18 16:57 ` Yonghong Song
2025-08-18 17:02 ` Yonghong Song
2025-08-19 16:20 ` Arnaud Lecomte
2025-08-13 5:54 ` [PATCH bpf-next v3 1/2] bpf: refactor max_depth computation in bpf_get_stack() Yonghong Song
2025-08-12 19:32 ` [PATCH RESEND v2 " Arnaud Lecomte
2025-08-08 7:30 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).