Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Tue, 23 Jun 2026 19:46:19 +0000	[thread overview]
Message-ID: <20260623194619.B3B4E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v7-4-1ba770c862ae@arm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Cross-CPU State Corruption via global thread stack flush in `cs_etm__flush()`

Pre-existing issues:
- [Critical] Heap buffer overflow in `thread_stack__br_sample()` when `ts->br_stack_sz > sz`
- [High] Unbounded memory leak for per-CPU threads when `callstack` is false
- [High] NULL pointer dereference crash when `thread_stack__event()` is interleaved with `thread_stack__process()`
--

commit b1984c35dca6a97fee9de9c9a7a5db94fd997b41
Author: Leo Yan <leo.yan@arm.com>
perf cs-etm: Use thread-stack for last branch entries

This commit modifies the CoreSight ETM trace code to use the common
thread_stack helper functions for last branch entries. It removes the
private last_branch_rb circular buffer in favor of tracking branch history
at the thread level.

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
[ ... ]
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> +				    struct cs_etm_traceid_queue *tidq)
> +{
[ ... ]
> +	if (etmq->etm->synth_opts.last_branch) {
> +		from = cs_etm__last_executed_instr(tidq->prev_packet);
> +		to = cs_etm__first_executed_instr(tidq->packet);
> +
> +		size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> +		/* Enable callchain so thread stack entry can be allocated */
> +		thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags, from, to, size,
> +				    etmq->buffer->buffer_nr + 1, false,
> +				    tidq->br_stack_sz, 0);

[Severity: High]
This is a pre-existing issue, but does passing false for callstack here
trigger an unbounded memory leak in thread_stack__event()?

In tools/perf/util/thread-stack.c, thread__cpu_stack() will
unconditionally return NULL if !ts->stack. Because callstack=false skips
allocating ts->stack, thread__cpu_stack() constantly returns NULL.

Subsequently, thread_stack__event() sees a missing stack and re-invokes
thread_stack__new(), allocating and leaking a new ts->br_stack_rb ring
buffer on every single branch event. This could lead to rapid OOM crashes
during trace decoding.

[Severity: High]
This is also a pre-existing issue, but could the new thread stack usage
cause a NULL pointer dereference crash if thread_stack__event() is
interleaved with thread_stack__process()?

If perf script is run with a DB export script, db_export__sample() calls
thread_stack__process(), which supersedes the thread stack with
br_stack_sz=0, leaving ts->br_stack_rb as NULL.

If a subsequent branch packet invokes thread_stack__event() here with
br_stack_sz > 0, thread_stack__update_br_stack() will unconditionally
dereference the NULL ts->br_stack_rb buffer.

[ ... ]
> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	cs_etm__copy_insn(etmq, tidq, packet, &sample);
>  
> -	if (etm->synth_opts.last_branch)
> +	if (etm->synth_opts.last_branch) {
> +		thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> +					tidq->last_branch, tidq->br_stack_sz);

[Severity: Critical]
This isn't a bug introduced by this patch, but does calling
thread_stack__br_sample() risk a heap buffer overflow here if
ts->br_stack_sz > sz?

In tools/perf/util/thread-stack.c:thread_stack__br_sample(), when wrapping
the ring buffer, the memcpy incorrectly uses ts->br_stack_pos as the size
scalar instead of the newly computed nr bound:

if (src->nr >= ts->br_stack_sz) {
    sz -= nr;
    be = &dst->entries[nr];
    nr = min(ts->br_stack_pos, sz);
    memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);
}

If another tracing mechanism (like intel_pt) inflates ts->br_stack_sz
beyond the requested branch stack size sz, this memcpy could write far
beyond the end of the dst->entries allocation.

[ ... ]
> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> +		thread_stack__flush(tidq->frontend_thread);

[Severity: High]
Could this global thread stack flush cause cross-CPU state corruption?

When a trace discontinuity or block end on a single CPU invokes
cs_etm__flush(), thread_stack__flush() unconditionally loops over all
CPUs for the given thread. 

For globally shared threads (like the idle thread) or multi-threaded
processes, a local trace flush on one CPU appears to improperly reset
and destroy the active branch stack arrays of all other concurrently
traced CPUs.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v7-0-1ba770c862ae@arm.com?part=4

  parent reply	other threads:[~2026-06-23 19:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  7:56 [PATCH v7 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11  7:56 ` [PATCH v7 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11  8:58   ` James Clark
2026-06-11  7:56 ` [PATCH v7 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11  7:56 ` [PATCH v7 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11  7:56 ` [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11  9:01   ` James Clark
2026-06-23 19:46   ` sashiko-bot [this message]
2026-06-11  7:56 ` [PATCH v7 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-23 19:58   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-23 20:11   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-23 20:24   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11  9:11   ` James Clark
2026-06-11 12:42     ` Leo Yan
2026-06-23 20:31   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623194619.B3B4E1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox