From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Thu, 11 Jun 2026 17:18:46 +0000 [thread overview]
Message-ID: <20260611171846.DC8031F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v8-4-737948584fea@arm.com>
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [Medium] Branch history and callchain are corrupted in CPU-wide trace mode due to sharing a single `thread_stack` across all CPUs.
- [Medium] Improper handling of trace discontinuity via `thread_stack__flush()` causes stale branches and cross-CPU corruption.
- [Medium] Missing NULL check on thread allocation leads to segmentation fault in newly added `thread_stack__flush()`.
Pre-existing issues:
- [Critical] Heap buffer overflow (out-of-bounds write) in `thread_stack__br_sample` due to incorrect `memcpy` length.
--
commit 0a266f6d8db7cc2c2667bdbd916921a70696b02b
Author: Leo Yan <leo.yan@arm.com>
perf cs-etm: Use thread-stack for last branch entries
This commit replaces the custom CoreSight ETM last branch circular array
with the common thread_stack helpers. It records taken branches with
thread_stack__event() and synthesizes PERF_SAMPLE_BRANCH_STACK data with
thread_stack__br_sample().
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
> return etm->latest_kernel_timestamp;
> }
>
> +static bool cs_etm__packet_has_taken_branch(struct cs_etm_packet *packet)
> +{
> + if (packet->sample_type == CS_ETM_RANGE &&
> + packet->last_instr_taken_branch)
> + return true;
> +
> + return false;
> +}
> +
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq)
> +{
> + u64 from, to;
> + int size;
> +
> + if (!cs_etm__packet_has_taken_branch(tidq->prev_packet))
> + return;
> +
> + if (etmq->etm->synth_opts.last_branch) {
> + from = cs_etm__last_executed_instr(tidq->prev_packet);
> + to = cs_etm__first_executed_instr(tidq->packet);
> +
> + size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> + /* Enable callchain so thread stack entry can be allocated */
> + thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> + tidq->prev_packet->flags, from, to, size,
> + etmq->buffer->buffer_nr + 1, false,
> + tidq->br_stack_sz, 0);
[Severity: Medium]
In CPU-wide tracing mode where hardware context IDs are absent, queue->tid
is initialized to -1. machine__findnew_thread() assigns the same dummy
thread (TID=-1) to all trace queues. Because thread_stack__per_cpu() only
handles TID=0 for idle tasks, all CPU trace queues will share the exact
same thread_stack object.
Does this cause thread_stack__event() to push branches from multiple CPUs
into the same shared branch stack ring buffer, corrupting the trace output
by interleaving them?
> + } else {
> + thread_stack__set_trace_nr(tidq->frontend_thread,
> + tidq->prev_packet->cpu,
> + etmq->buffer->buffer_nr + 1);
> + }
> +}
> +
[ ... ]
> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>
> cs_etm__copy_insn(etmq, tidq, packet, &sample);
>
> - if (etm->synth_opts.last_branch)
> + if (etm->synth_opts.last_branch) {
> + thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> + tidq->last_branch, tidq->br_stack_sz);
[Severity: Critical]
This is a pre-existing issue, but calling thread_stack__br_sample() can
trigger an out-of-bounds write in that function.
In thread_stack__br_sample(), when copying from the wrapped ring buffer, it
calculates bounds using nr but ignores it in memcpy:
nr = min(ts->br_stack_pos, sz);
memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);
If the destination buffer size sz is smaller than ts->br_stack_pos, does the
memcpy write beyond the allocated bounds for nr elements?
> sample.branch_stack = tidq->last_branch;
> + }
[ ... ]
> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>
> /* Reset last branches after flush the trace */
> if (etm->synth_opts.last_branch)
> - cs_etm__reset_last_branch_rb(tidq);
> + thread_stack__flush(tidq->frontend_thread);
[Severity: Medium]
In cs_etm__init_traceid_queue(), the return value of
machine__findnew_thread() is not checked, meaning tidq->frontend_thread
could be NULL upon memory allocation failure.
Does thread_stack__flush() dereference tidq->frontend_thread without
validation, potentially leading to a segmentation fault on error paths?
[Severity: Medium]
Looking at thread_stack__flush(), it clears the call stack but does not
reset the branch stack array (ts->br_stack_pos and ts->br_stack_rb remain
intact).
Will this leave stale branch entries from before a trace gap to be prepended
to the new trace segment?
Additionally, thread_stack__flush() iterates and flushes stacks for all
CPUs associated with the thread. For a shared dummy thread (TID=-1) or the
idle thread (TID=0), does this inappropriately flush and destroy in-progress
histories for all other CPUs?
>
> return err;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v8-0-737948584fea@arm.com?part=4
next prev parent reply other threads:[~2026-06-11 17:18 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-11 14:50 [PATCH v8 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11 14:50 ` [PATCH v8 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11 14:50 ` [PATCH v8 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11 14:50 ` [PATCH v8 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11 14:50 ` [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11 17:18 ` sashiko-bot [this message]
2026-06-11 14:50 ` [PATCH v8 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-11 15:06 ` sashiko-bot
2026-06-11 18:00 ` Leo Yan
2026-06-11 18:06 ` Leo Yan
2026-06-11 14:50 ` [PATCH v8 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-11 15:11 ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-11 15:05 ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11 15:06 ` James Clark
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611171846.DC8031F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=leo.yan@arm.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox