From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v10 5/9] perf cs-etm: Use thread-stack for last branch entries
Date: Wed, 17 Jun 2026 20:56:09 +0000 [thread overview]
Message-ID: <20260617205609.D73851F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260617-b4-arm_cs_callchain_support_v1-v10-5-e8b6e5d63db5@arm.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] Integration with `thread_stack__event` exposes a fatal NULL pointer dereference when CS ETM branch tracing is used concurrently with db-export.
- [Medium] Calling `thread_stack__flush` with `frontend_thread` erroneously flushes branch stacks for all CPUs instead of just the affected channel.
Pre-existing issues:
- [Critical] Preexisting heap buffer overflow in `thread_stack__br_sample` when copying ring buffer elements.
--
commit 6bec7142808bcd09595848f0ba62d4272539a5be
Author: Leo Yan <leo.yan@arm.com>
perf cs-etm: Use thread-stack for last branch entries
This commit refactors CS ETM to use common thread-stack helpers for
last branch entries instead of maintaining its own circular array.
It removes the private last_branch_rb buffer and synthesizes branch stack
data using thread_stack__event() and thread_stack__br_sample().
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 6827ef8871a8f..5ede0f0ff8c6e 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1619,6 +1532,57 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
> return etm->latest_kernel_timestamp;
> }
>
> +static bool cs_etm__packet_has_taken_branch(struct cs_etm_packet *packet)
> +{
> + if (packet->sample_type == CS_ETM_RANGE &&
> + packet->last_instr_taken_branch)
> + return true;
> +
> + return false;
> +}
> +
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq)
> +{
> + struct cs_etm_auxtrace *etm = etmq->etm;
> + u64 from, to;
> + int size;
> +
> + if (!etm->synth_opts.branches && !etm->synth_opts.instructions)
> + return;
> +
> + if (!cs_etm__packet_has_taken_branch(tidq->prev_packet))
> + return;
> +
> + if (etmq->etm->synth_opts.last_branch) {
> + from = cs_etm__last_executed_instr(tidq->prev_packet);
> + to = cs_etm__first_executed_instr(tidq->packet);
> +
> + size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> + /* Enable callchain so thread stack entry can be allocated */
> + thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> + tidq->prev_packet->flags, from, to, size,
> + etmq->buffer->buffer_nr + 1, false,
> + tidq->br_stack_sz, 0);
[Severity: Medium]
When CS ETM traces are processed with db-export (e.g., via a Python script),
could this integration with thread_stack__event() expose a NULL pointer
dereference?
If db-export supersedes the stack, thread_stack__process() recreates ts with
br_stack_sz = 0:
tools/perf/util/thread-stack.c:thread_stack__process() {
...
thread_stack__reset(thread, ts);
ts = NULL;
...
}
However, subsequent trace packets cause CS ETM to call thread_stack__event()
with a br_stack_sz greater than 0, which blindly invokes
thread_stack__update_br_stack(). Can this dereference a now-NULL
ts->br_stack_rb?
tools/perf/util/thread-stack.c:thread_stack__update_br_stack() {
...
be = &bs->entries[ts->br_stack_pos];
...
}
> + } else {
> + thread_stack__set_trace_nr(tidq->frontend_thread,
> + tidq->prev_packet->cpu,
> + etmq->buffer->buffer_nr + 1);
> + }
> +}
> +
> +static void cs_etm__sample_branch_stack(struct cs_etm_auxtrace *etm,
> + struct cs_etm_traceid_queue *tidq,
> + struct perf_sample *sample)
> +{
> + if (etm->synth_opts.last_branch) {
> + thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> + tidq->last_branch, tidq->br_stack_sz);
[Severity: High]
This is a pre-existing issue, but I wanted to ask about a potential heap
buffer overflow in thread_stack__br_sample() when copying ring buffer
elements.
When the branch stack ring buffer wraps around, the function copies the
remaining older elements:
tools/perf/util/thread-stack.c:thread_stack__br_sample() {
...
if (src->nr >= ts->br_stack_sz) {
sz -= nr;
be = &dst->entries[nr];
nr = min(ts->br_stack_pos, sz);
memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);
}
...
}
Since the second memcpy() uses bsz * ts->br_stack_pos instead of bsz * nr,
and ts->br_stack_sz can be dynamically allocated to be much larger than the
requested size (for example, intel-pt allocates requested_size + 1024),
could ts->br_stack_pos exceed the remaining capacity of the destination
buffer and cause an out-of-bounds write?
> + sample->branch_stack = tidq->last_branch;
> + }
> +}
[ ... ]
> @@ -2025,7 +1971,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>
> /* Reset last branches after flush the trace */
> if (etm->synth_opts.last_branch)
> - cs_etm__reset_last_branch_rb(tidq);
> + thread_stack__flush(tidq->frontend_thread);
[Severity: Medium]
Does calling thread_stack__flush() with the frontend_thread clear the branch
stacks for all CPUs?
The thread_stack__flush() function iterates over all per-CPU thread stacks
allocated for the given thread struct:
tools/perf/util/thread-stack.c:thread_stack__flush() {
...
for (pos = 0; pos < ts->arr_sz; pos++) {
...
}
In CPU-wide tracing modes, frontend_thread represents the shared idle thread
across all CPUs. Could a trace discontinuity on a single CPU inadvertently
wipe out the active branch stacks for all other unaffected CPUs instead of
restricting the flush to tidq->prev_packet->cpu?
>
> return err;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-b4-arm_cs_callchain_support_v1-v10-0-e8b6e5d63db5@arm.com?part=5
next prev parent reply other threads:[~2026-06-17 20:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 15:08 [PATCH v10 0/9] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-17 15:08 ` [PATCH v10 1/9] perf cs-etm: Fix thread leaks on trace queue init failure Leo Yan
2026-06-17 20:25 ` sashiko-bot
2026-06-17 15:08 ` [PATCH v10 2/9] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-17 15:08 ` [PATCH v10 3/9] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-17 15:08 ` [PATCH v10 4/9] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-17 15:08 ` [PATCH v10 5/9] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-17 20:56 ` sashiko-bot [this message]
2026-06-17 15:08 ` [PATCH v10 6/9] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-17 21:08 ` sashiko-bot
2026-06-17 15:08 ` [PATCH v10 7/9] perf cs-etm: Support call indentation Leo Yan
2026-06-17 21:20 ` sashiko-bot
2026-06-17 15:08 ` [PATCH v10 8/9] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-17 21:35 ` sashiko-bot
2026-06-17 15:09 ` [PATCH v10 9/9] perf test: Add Arm CoreSight callchain test Leo Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617205609.D73851F00A3A@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=leo.yan@arm.com \
--cc=linux-perf-users@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.