Re: [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries
Date: Tue, 26 May 2026 18:32:25 +0000	[thread overview]
Message-ID: <20260526183226.376271F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260526-b4-arm_cs_callchain_support_v1-v6-3-f9f49f53c9dd@arm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Flushing the thread branch stack via thread_stack__flush() globally resets branch state for all CPUs a thread has run on, corrupting concurrent branch traces.
- [High] Unconditional NULL pointer dereference of `tidq->thread` in `cs_etm__flush()` when flushing the thread stack.
- [Medium] Forcing the creation of a full callstack introduces hidden CPU and memory overhead even when only branch tracking is requested.

Pre-existing issues:
- [Critical] Heap buffer overflow in thread_stack__br_sample() when copying branch entries.
- [High] NULL pointer dereference in thread_stack__update_br_stack() when processing mixed perf events.
--

commit 423cf82c1cfc3330f4878f3ed31b1cf46de41008
Author: Leo Yan <leo.yan@arm.com>

perf cs-etm: Use thread-stack for last branch entries

CS ETM maintains its own circular array for last branch entries, with
local helpers to update, copy and reset the branch stack. This duplicates
logic already provided by the common code.

Record branch with thread_stack__event() and synthesize branch stack
with thread_stack__br_sample(). This removes the local last_branch_rb
buffer and position tracking.

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c

[ ... ]

> @@ -1579,6 +1492,37 @@ static inline u64 cs_etm__resolve_sample_time(
> +		/* Enable callchain so thread stack entry can be allocated */
> +		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags, from, to, size,
> +				    etmq->buffer->buffer_nr + 1, true,
> +				    tidq->br_stack_sz, 0);

[Severity: High]
This is a pre-existing issue, but is there a NULL pointer dereference in
thread_stack__update_br_stack() when processing mixed perf events?

If a thread stack is initially created by a normal PERF_RECORD_SAMPLE event
via thread_stack__process(), it is initialized with br_stack_sz = 0.
Consequently, ts->br_stack_rb is left as NULL.

When an auxtrace record (from intel-pt or cs-etm) is processed subsequently,
thread_stack__event() bypasses initialization because the thread stack already
exists. 

It then unconditionally calls thread_stack__update_br_stack() since
tidq->br_stack_sz > 0, which dereferences ts->br_stack_rb->entries[...],
causing an immediate crash.

[Severity: Medium]
Does passing true for the callstack parameter force the creation of a full
callstack?

This causes thread_stack__event() to parse PERF_IP_FLAG_CALL and
PERF_IP_FLAG_RETURN events and maintain a full function call stack in memory.

Could this introduce hidden CPU and memory overhead even when users only
requested the lightweight --last-branch tracking?

[ ... ]

> @@ -1608,8 +1552,12 @@ static int cs_etm__synth_instruction_sample(
>  
>  	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>  
> -	if (etm->synth_opts.last_branch)
> +	if (etm->synth_opts.last_branch) {
> +		thread_stack__br_sample(tidq->thread, tidq->packet->cpu,
> +					tidq->last_branch,
> +					tidq->br_stack_sz);

[Severity: Critical]
This is a pre-existing issue, but does thread_stack__br_sample() have a heap
buffer overflow when copying branch entries?

If ts->br_stack_sz is greater than the size of the destination buffer, the
wrap-around logic is triggered.

The second memcpy intended to copy the remaining elements is:
memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);

It completely ignores the calculated nr bounds and copies up to ts->br_stack_sz
entries into the destination buffer. For Intel PT, ts->br_stack_sz can be 1024
while the destination size is 16.

Can this write hundreds of branch entries past the end of the array,
leading to a heap buffer overflow?

[ ... ]

> @@ -1982,7 +1915,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> +		thread_stack__flush(tidq->thread);

[Severity: High]
Does thread_stack__flush() globally reset the branch state for all CPUs a
thread has run on?

For threads that can run on multiple CPUs concurrently (e.g., the idle thread),
thread->ts points to an array of thread stacks for each CPU, and
thread_stack__flush() iterates over ts->arr_sz and resets all of them.

Can this cause a trace discontinuity on one CPU's queue to wipe out the active
branch histories and call chains of the same thread on all other CPUs?

[Severity: High]
Can tidq->thread be NULL here?

In cs_etm__init_traceid_queue(), tidq->thread is initialized via
machine__findnew_thread(), which may return NULL on allocation failure.

If tidq->thread is NULL, won't thread_stack__flush() cause an immediate NULL
pointer dereference?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260526-b4-arm_cs_callchain_support_v1-v6-0-f9f49f53c9dd@arm.com?part=3

next prev parent reply	other threads:[~2026-05-26 18:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-05-26 16:59 ` [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-05-26 16:59 ` [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-05-26 18:32   ` sashiko-bot [this message]
2026-05-26 16:59 ` [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-05-26 19:10   ` sashiko-bot
2026-05-26 16:59 ` [PATCH v6 5/8] perf cs-etm: Support call indentation Leo Yan
2026-05-26 19:42   ` sashiko-bot
2026-05-26 16:59 ` [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-05-26 20:56   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260526183226.376271F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox