From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8150F15B135 for ; Wed, 6 Aug 2025 01:52:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754445143; cv=none; b=G9b2V46pP/HLxbc84i2aQyHhCZVK56ZXhcT0pVzP/6dodOkWF2OzQUbszuAc9LL/nFxyEJI5n3LIvoG+WTltRfEvPANSorYhUFDqzNJgd/Z+UVtPPHGhS/50hxMXvbnNr8uL2u138Eh+YDW+fm4drHsoEqcwRohjVYFfXH08Mp8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754445143; c=relaxed/simple; bh=b9f2rD1YfLu11WXwEctrIPJGra3IBhhwAmmXYkomVtE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=a0dwDNJMMvJ90cdDjbyNYPfUfocf7x5L1OKgz+3WOSzhKXGIBUZtqetx/4L0VQWuFOABxG2UbBCMi0pyWXSFSA7BU1Km6Bu+1Equ/spMb1Bf4GhTsGvUtTJfVIvWS4sQODtNPRpJwnExdCZJZgvjVz1vRwNnW8MM62G0uhZbXjA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=F2mDjGsv; arc=none smtp.client-ip=95.215.58.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="F2mDjGsv" Message-ID: <6cc26e1f-6ad6-44cd-a049-c4e7af9a229a@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1754445138; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8Rw84AxFvQ89TE/hsNrqClvGXCtds3haEjZkh5cXuYY=; b=F2mDjGsvg/ZhTwCHgJWl1vysoQHmg4gQsLQbS8ohwxJhkbECwfzIofiZ1UkNdQGVpob/2U lPim9f7XYXY4JRotsybJVWvKB3dtVt5ltdJCo7y/ikuui95ac1uKT8Enr0z+xf5i2X/Jx4 LZLd4sdcmpbd7uOZAjktvH3XbojTrHU= Date: Tue, 5 Aug 2025 18:52:02 -0700 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v2] bpf: fix stackmap overflow check in __bpf_get_stackid() Content-Language: en-GB To: Arnaud Lecomte , song@kernel.org, jolsa@kernel.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com, syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com References: <20250729165622.13794-1-contact@arnaud-lcm.com> <2b69e397-a457-4dba-86f1-47b7fe87ef79@linux.dev> <5124b615-3a71-4a44-a497-eea3b5964fda@arnaud-lcm.com> <6ce83e5c-de34-4ef2-b9f4-2ad15e645969@arnaud-lcm.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 8/5/25 1:49 PM, Arnaud Lecomte wrote: > Hi, > I gave it several tries and I can't find a nice to do see properly. > The main challenge is to find a way to detect memory corruption. I > wanted to place a canary value >  by tweaking the map size but we don't have a way from a BPF program > perspective to access to the size > of a stack_map_bucket. If we decide to do this computation manually, > we would end-up with maintainability >  issues: > #include "vmlinux.h" > #include "bpf/bpf_helpers.h" > > #define MAX_STACK_DEPTH 32 > #define CANARY_VALUE 0xBADCAFE > > /* Calculate size based on known layout: >  * - fnode: sizeof(void*) >  * - hash: 4 bytes >  * - nr: 4 bytes >  * - data: MAX_STACK_DEPTH * 8 bytes >  * - canary: 8 bytes >  */ > #define VALUE_SIZE (sizeof(void*) + 4 + 4 + (MAX_STACK_DEPTH * 8) + 8) > > struct { >     __uint(type, BPF_MAP_TYPE_STACK_TRACE); >     __uint(max_entries, 1); >     __uint(value_size, VALUE_SIZE); >     __uint(key_size, sizeof(u32)); > } stackmap SEC(".maps"); > > static __attribute__((noinline)) void recursive_helper(int depth) { >     if (depth <= 0) return; >     asm volatile("" ::: "memory"); >     recursive_helper(depth - 1); > } > > SEC("kprobe/do_sys_open") > int test_stack_overflow(void *ctx) { >     u32 key = 0; >     u64 *stack = bpf_map_lookup_elem(&stackmap, &key); >     if (!stack) return 0; > >     stack[MAX_STACK_DEPTH] = CANARY_VALUE; > >     /* Force minimum stack depth */ >     recursive_helper(MAX_STACK_DEPTH + 10); > >     (void)bpf_get_stackid(ctx, &stackmap, 0); >     return 0; > } > > char _license[] SEC("license") = "GPL"; It looks like it hard to trigger memory corruption inside the kernel. Maybe kasan can detect it for your specific example. If without selftests, you can do the following: __bpf_get_stack() already solved the problem you tried to fix. I suggest you refactor some portions of the code in __bpf_get_stack() to set trace_nr properly, and then you can use that refactored function in __bpf_get_stackid(). So two patches: 1. refactor portion of codes (related elem_size/trace_nr) in __bpf_get_stack(). 2. fix the issue in __bpf_get_stackid() with newly created function. > > On 01/08/2025 19:16, Lecomte, Arnaud wrote: >> Well, it turns out it is less straightforward than it looked like to >> detect the memory corruption >>  without KASAN. I am currently in holidays for the next 3 days so >> I've limited access to a >> computer. I should be able to sort this out on monday. >> >> Thanks, >> Arnaud >> >> On 30/07/2025 08:10, Arnaud Lecomte wrote: >>> On 29/07/2025 23:45, Yonghong Song wrote: >>>> >>>> >>>> On 7/29/25 9:56 AM, Arnaud Lecomte wrote: >>>>> Syzkaller reported a KASAN slab-out-of-bounds write in >>>>> __bpf_get_stackid() >>>>> when copying stack trace data. The issue occurs when the perf trace >>>>>   contains more stack entries than the stack map bucket can hold, >>>>>   leading to an out-of-bounds write in the bucket's data array. >>>>> For build_id mode, we use sizeof(struct bpf_stack_build_id) >>>>>   to determine capacity, and for normal mode we use sizeof(u64). >>>>> >>>>> Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com >>>>> Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b >>>>> Tested-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com >>>>> Signed-off-by: Arnaud Lecomte >>>> >>>> Could you add a selftest? This way folks can easily find out what is >>>> the problem and why this fix solves the issue correctly. >>>> >>> Sure, will be done after work >>> Thanks, >>> Arnaud >>>>> --- >>>>> Changes in v2: >>>>>   - Use utilty stack_map_data_size to compute map stack map size >>>>> --- >>>>>   kernel/bpf/stackmap.c | 8 +++++++- >>>>>   1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c >>>>> index 3615c06b7dfa..6f225d477f07 100644 >>>>> --- a/kernel/bpf/stackmap.c >>>>> +++ b/kernel/bpf/stackmap.c >>>>> @@ -230,7 +230,7 @@ static long __bpf_get_stackid(struct bpf_map >>>>> *map, >>>>>       struct bpf_stack_map *smap = container_of(map, struct >>>>> bpf_stack_map, map); >>>>>       struct stack_map_bucket *bucket, *new_bucket, *old_bucket; >>>>>       u32 skip = flags & BPF_F_SKIP_FIELD_MASK; >>>>> -    u32 hash, id, trace_nr, trace_len, i; >>>>> +    u32 hash, id, trace_nr, trace_len, i, max_depth; >>>>>       bool user = flags & BPF_F_USER_STACK; >>>>>       u64 *ips; >>>>>       bool hash_matches; >>>>> @@ -241,6 +241,12 @@ static long __bpf_get_stackid(struct bpf_map >>>>> *map, >>>>>         trace_nr = trace->nr - skip; >>>>>       trace_len = trace_nr * sizeof(u64); >>>>> + >>>>> +    /* Clamp the trace to max allowed depth */ >>>>> +    max_depth = smap->map.value_size / stack_map_data_size(map); >>>>> +    if (trace_nr > max_depth) >>>>> +        trace_nr = max_depth; >>>>> + >>>>>       ips = trace->ip + skip; >>>>>       hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0); >>>>>       id = hash & (smap->n_buckets - 1); >>>> >>>>