All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Tue, 23 Jun 2026 19:46:19 +0000	[thread overview]
Message-ID: <20260623194619.B3B4E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v7-4-1ba770c862ae@arm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Cross-CPU State Corruption via global thread stack flush in `cs_etm__flush()`

Pre-existing issues:
- [Critical] Heap buffer overflow in `thread_stack__br_sample()` when `ts->br_stack_sz > sz`
- [High] Unbounded memory leak for per-CPU threads when `callstack` is false
- [High] NULL pointer dereference crash when `thread_stack__event()` is interleaved with `thread_stack__process()`
--

commit b1984c35dca6a97fee9de9c9a7a5db94fd997b41
Author: Leo Yan <leo.yan@arm.com>
perf cs-etm: Use thread-stack for last branch entries

This commit modifies the CoreSight ETM trace code to use the common
thread_stack helper functions for last branch entries. It removes the
private last_branch_rb circular buffer in favor of tracking branch history
at the thread level.

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
[ ... ]
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> +				    struct cs_etm_traceid_queue *tidq)
> +{
[ ... ]
> +	if (etmq->etm->synth_opts.last_branch) {
> +		from = cs_etm__last_executed_instr(tidq->prev_packet);
> +		to = cs_etm__first_executed_instr(tidq->packet);
> +
> +		size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> +		/* Enable callchain so thread stack entry can be allocated */
> +		thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags, from, to, size,
> +				    etmq->buffer->buffer_nr + 1, false,
> +				    tidq->br_stack_sz, 0);

[Severity: High]
This is a pre-existing issue, but does passing false for callstack here
trigger an unbounded memory leak in thread_stack__event()?

In tools/perf/util/thread-stack.c, thread__cpu_stack() will
unconditionally return NULL if !ts->stack. Because callstack=false skips
allocating ts->stack, thread__cpu_stack() constantly returns NULL.

Subsequently, thread_stack__event() sees a missing stack and re-invokes
thread_stack__new(), allocating and leaking a new ts->br_stack_rb ring
buffer on every single branch event. This could lead to rapid OOM crashes
during trace decoding.

[Severity: High]
This is also a pre-existing issue, but could the new thread stack usage
cause a NULL pointer dereference crash if thread_stack__event() is
interleaved with thread_stack__process()?

If perf script is run with a DB export script, db_export__sample() calls
thread_stack__process(), which supersedes the thread stack with
br_stack_sz=0, leaving ts->br_stack_rb as NULL.

If a subsequent branch packet invokes thread_stack__event() here with
br_stack_sz > 0, thread_stack__update_br_stack() will unconditionally
dereference the NULL ts->br_stack_rb buffer.

[ ... ]
> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	cs_etm__copy_insn(etmq, tidq, packet, &sample);
>  
> -	if (etm->synth_opts.last_branch)
> +	if (etm->synth_opts.last_branch) {
> +		thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> +					tidq->last_branch, tidq->br_stack_sz);

[Severity: Critical]
This isn't a bug introduced by this patch, but does calling
thread_stack__br_sample() risk a heap buffer overflow here if
ts->br_stack_sz > sz?

In tools/perf/util/thread-stack.c:thread_stack__br_sample(), when wrapping
the ring buffer, the memcpy incorrectly uses ts->br_stack_pos as the size
scalar instead of the newly computed nr bound:

if (src->nr >= ts->br_stack_sz) {
    sz -= nr;
    be = &dst->entries[nr];
    nr = min(ts->br_stack_pos, sz);
    memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);
}

If another tracing mechanism (like intel_pt) inflates ts->br_stack_sz
beyond the requested branch stack size sz, this memcpy could write far
beyond the end of the dst->entries allocation.

[ ... ]
> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> +		thread_stack__flush(tidq->frontend_thread);

[Severity: High]
Could this global thread stack flush cause cross-CPU state corruption?

When a trace discontinuity or block end on a single CPU invokes
cs_etm__flush(), thread_stack__flush() unconditionally loops over all
CPUs for the given thread. 

For globally shared threads (like the idle thread) or multi-threaded
processes, a local trace flush on one CPU appears to improperly reset
and destroy the active branch stack arrays of all other concurrently
traced CPUs.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v7-0-1ba770c862ae@arm.com?part=4

  parent reply	other threads:[~2026-06-23 19:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11  7:56 [PATCH v7 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11  7:56 ` [PATCH v7 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11  8:58   ` James Clark
2026-06-11  7:56 ` [PATCH v7 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11  7:56 ` [PATCH v7 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11  7:56 ` [PATCH v7 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11  9:01   ` James Clark
2026-06-23 19:46   ` sashiko-bot [this message]
2026-06-11  7:56 ` [PATCH v7 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-23 19:58   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-23 20:11   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-23 20:24   ` sashiko-bot
2026-06-11  7:57 ` [PATCH v7 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11  9:11   ` James Clark
2026-06-11 12:42     ` Leo Yan
2026-06-23 20:31   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623194619.B3B4E1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.