From: Martin KaFai Lau <martin.lau@linux.dev>
To: Arnaud Lecomte <contact@arnaud-lcm.com>, yonghong.song@linux.dev
Cc: andrii@kernel.org, ast@kernel.org, bpf@vger.kernel.org,
daniel@iogearbox.net, eddyz87@gmail.com, haoluo@google.com,
john.fastabend@gmail.com, jolsa@kernel.org, kpsingh@kernel.org,
linux-kernel@vger.kernel.org, sdf@fomichev.me,
syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com,
syzkaller-bugs@googlegroups.com, song@kernel.org
Subject: Re: [PATCH bpf-next RESEND v4 1/2] bpf: refactor max_depth computation in bpf_get_stack()
Date: Tue, 19 Aug 2025 14:15:08 -0700 [thread overview]
Message-ID: <3d8fe484-2889-4367-9405-91aeee7d2ef0@linux.dev> (raw)
In-Reply-To: <20250819162652.8776-1-contact@arnaud-lcm.com>
On 8/19/25 9:26 AM, Arnaud Lecomte wrote:
> A new helper function stack_map_calculate_max_depth() that
> computes the max depth for a stackmap.
>
> Changes in v2:
> - Removed the checking 'map_size % map_elem_size' from
> stack_map_calculate_max_depth
> - Changed stack_map_calculate_max_depth params name to be more generic
>
> Changes in v3:
> - Changed map size param to size in max depth helper
>
> Changes in v4:
> - Fixed indentation in max depth helper for args
>
> Link to v3: https://lore.kernel.org/all/09dc40eb-a84e-472a-8a68-36a2b1835308@linux.dev/
>
> Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> kernel/bpf/stackmap.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..b9cc6c72a2a5 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -42,6 +42,27 @@ static inline int stack_map_data_size(struct bpf_map *map)
> sizeof(struct bpf_stack_build_id) : sizeof(u64);
> }
>
> +/**
> + * stack_map_calculate_max_depth - Calculate maximum allowed stack trace depth
> + * @size: Size of the buffer/map value in bytes
> + * @elem_size: Size of each stack trace element
> + * @flags: BPF stack trace flags (BPF_F_USER_STACK, BPF_F_USER_BUILD_ID, ...)
> + *
> + * Return: Maximum number of stack trace entries that can be safely stored
> + */
> +static u32 stack_map_calculate_max_depth(u32 size, u32 elem_size, u64 flags)
> +{
> + u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + u32 max_depth;
> +
> + max_depth = size / elem_size;
> + max_depth += skip;
> + if (max_depth > sysctl_perf_event_max_stack)
> + return sysctl_perf_event_max_stack;
hmm... this looks a bit suspicious. Is it possible that
sysctl_perf_event_max_stack is being changed to a larger value in parallel?
> +
> + return max_depth;
> +}
> +
> static int prealloc_elems_and_freelist(struct bpf_stack_map *smap)
> {
> u64 elem_size = sizeof(struct stack_map_bucket) +
> @@ -406,7 +427,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> struct perf_callchain_entry *trace_in,
> void *buf, u32 size, u64 flags, bool may_fault)
> {
> - u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
> + u32 trace_nr, copy_len, elem_size, max_depth;
> bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> bool crosstask = task && task != current;
> u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> @@ -438,10 +459,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> goto clear;
> }
>
> - num_elem = size / elem_size;
> - max_depth = num_elem + skip;
> - if (sysctl_perf_event_max_stack < max_depth)
> - max_depth = sysctl_perf_event_max_stack;
> + max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
> if (may_fault)
> rcu_read_lock(); /* need RCU for perf's callchain below */
> @@ -461,7 +479,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> }
>
> trace_nr = trace->nr - skip;
> - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
I suspect it was fine because trace_nr was still bounded by num_elem.
> + trace_nr = min(trace_nr, max_depth - skip);
but now the min() is also based on max_depth which could be
sysctl_perf_event_max_stack.
beside, if I read it correctly, in "max_depth - skip", the max_depth could also
be less than skip. I assume trace->nr is bound by max_depth, so should be less
of a problem but still a bit unintuitive to read.
> copy_len = trace_nr * elem_size;
>
> ips = trace->ip + skip;
next prev parent reply other threads:[~2025-08-19 21:15 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 16:26 [PATCH bpf-next RESEND v4 1/2] bpf: refactor max_depth computation in bpf_get_stack() Arnaud Lecomte
2025-08-19 16:29 ` [PATCH bpf-next RESEND v4 2/2] bpf: fix stackmap overflow check in __bpf_get_stackid() Arnaud Lecomte
2025-08-19 21:15 ` Martin KaFai Lau [this message]
2025-08-25 16:39 ` [PATCH bpf-next RESEND v4 1/2] bpf: refactor max_depth computation in bpf_get_stack() Lecomte, Arnaud
2025-08-25 18:27 ` Yonghong Song
2025-08-25 20:07 ` Lecomte, Arnaud
2025-08-25 21:15 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3d8fe484-2889-4367-9405-91aeee7d2ef0@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=contact@arnaud-lcm.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.