From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Tue, 23 Jun 2026 19:46:19 +0000 [thread overview]
Message-ID: <20260623194619.B3B4E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v7-4-1ba770c862ae@arm.com>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [High] Cross-CPU State Corruption via global thread stack flush in `cs_etm__flush()`
Pre-existing issues:
- [Critical] Heap buffer overflow in `thread_stack__br_sample()` when `ts->br_stack_sz > sz`
- [High] Unbounded memory leak for per-CPU threads when `callstack` is false
- [High] NULL pointer dereference crash when `thread_stack__event()` is interleaved with `thread_stack__process()`
--
commit b1984c35dca6a97fee9de9c9a7a5db94fd997b41
Author: Leo Yan <leo.yan@arm.com>
perf cs-etm: Use thread-stack for last branch entries
This commit modifies the CoreSight ETM trace code to use the common
thread_stack helper functions for last branch entries. It removes the
private last_branch_rb circular buffer in favor of tracking branch history
at the thread level.
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
[ ... ]
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq)
> +{
[ ... ]
> + if (etmq->etm->synth_opts.last_branch) {
> + from = cs_etm__last_executed_instr(tidq->prev_packet);
> + to = cs_etm__first_executed_instr(tidq->packet);
> +
> + size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> + /* Enable callchain so thread stack entry can be allocated */
> + thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> + tidq->prev_packet->flags, from, to, size,
> + etmq->buffer->buffer_nr + 1, false,
> + tidq->br_stack_sz, 0);
[Severity: High]
This is a pre-existing issue, but does passing false for callstack here
trigger an unbounded memory leak in thread_stack__event()?
In tools/perf/util/thread-stack.c, thread__cpu_stack() will
unconditionally return NULL if !ts->stack. Because callstack=false skips
allocating ts->stack, thread__cpu_stack() constantly returns NULL.
Subsequently, thread_stack__event() sees a missing stack and re-invokes
thread_stack__new(), allocating and leaking a new ts->br_stack_rb ring
buffer on every single branch event. This could lead to rapid OOM crashes
during trace decoding.
[Severity: High]
This is also a pre-existing issue, but could the new thread stack usage
cause a NULL pointer dereference crash if thread_stack__event() is
interleaved with thread_stack__process()?
If perf script is run with a DB export script, db_export__sample() calls
thread_stack__process(), which supersedes the thread stack with
br_stack_sz=0, leaving ts->br_stack_rb as NULL.
If a subsequent branch packet invokes thread_stack__event() here with
br_stack_sz > 0, thread_stack__update_br_stack() will unconditionally
dereference the NULL ts->br_stack_rb buffer.
[ ... ]
> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>
> cs_etm__copy_insn(etmq, tidq, packet, &sample);
>
> - if (etm->synth_opts.last_branch)
> + if (etm->synth_opts.last_branch) {
> + thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> + tidq->last_branch, tidq->br_stack_sz);
[Severity: Critical]
This isn't a bug introduced by this patch, but does calling
thread_stack__br_sample() risk a heap buffer overflow here if
ts->br_stack_sz > sz?
In tools/perf/util/thread-stack.c:thread_stack__br_sample(), when wrapping
the ring buffer, the memcpy incorrectly uses ts->br_stack_pos as the size
scalar instead of the newly computed nr bound:
if (src->nr >= ts->br_stack_sz) {
sz -= nr;
be = &dst->entries[nr];
nr = min(ts->br_stack_pos, sz);
memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);
}
If another tracing mechanism (like intel_pt) inflates ts->br_stack_sz
beyond the requested branch stack size sz, this memcpy could write far
beyond the end of the dst->entries allocation.
[ ... ]
> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>
> /* Reset last branches after flush the trace */
> if (etm->synth_opts.last_branch)
> - cs_etm__reset_last_branch_rb(tidq);
> + thread_stack__flush(tidq->frontend_thread);
[Severity: High]
Could this global thread stack flush cause cross-CPU state corruption?
When a trace discontinuity or block end on a single CPU invokes
cs_etm__flush(), thread_stack__flush() unconditionally loops over all
CPUs for the given thread.
For globally shared threads (like the idle thread) or multi-threaded
processes, a local trace flush on one CPU appears to improperly reset
and destroy the active branch stack arrays of all other concurrently
traced CPUs.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v7-0-1ba770c862ae@arm.com?part=4
next prev parent reply other threads:[~2026-06-23 19:46 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 7:56 [PATCH v7 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11 7:56 ` [PATCH v7 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11 8:58 ` James Clark
2026-06-11 7:56 ` [PATCH v7 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11 7:56 ` [PATCH v7 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11 7:56 ` [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11 9:01 ` James Clark
2026-06-23 19:46 ` sashiko-bot [this message]
2026-06-11 7:56 ` [PATCH v7 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-23 19:58 ` sashiko-bot
2026-06-11 7:57 ` [PATCH v7 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-23 20:11 ` sashiko-bot
2026-06-11 7:57 ` [PATCH v7 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-23 20:24 ` sashiko-bot
2026-06-11 7:57 ` [PATCH v7 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11 9:11 ` James Clark
2026-06-11 12:42 ` Leo Yan
2026-06-23 20:31 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260623194619.B3B4E1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=leo.yan@arm.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox