Re: [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Thu, 11 Jun 2026 17:18:46 +0000	[thread overview]
Message-ID: <20260611171846.DC8031F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v8-4-737948584fea@arm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] Branch history and callchain are corrupted in CPU-wide trace mode due to sharing a single `thread_stack` across all CPUs.
- [Medium] Improper handling of trace discontinuity via `thread_stack__flush()` causes stale branches and cross-CPU corruption.
- [Medium] Missing NULL check on thread allocation leads to segmentation fault in newly added `thread_stack__flush()`.

Pre-existing issues:
- [Critical] Heap buffer overflow (out-of-bounds write) in `thread_stack__br_sample` due to incorrect `memcpy` length.
--

commit 0a266f6d8db7cc2c2667bdbd916921a70696b02b
Author: Leo Yan <leo.yan@arm.com>

perf cs-etm: Use thread-stack for last branch entries

This commit replaces the custom CoreSight ETM last branch circular array
with the common thread_stack helpers. It records taken branches with
thread_stack__event() and synthesizes PERF_SAMPLE_BRANCH_STACK data with
thread_stack__br_sample().

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c

[ ... ]

> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
>  		return etm->latest_kernel_timestamp;
>  }
>  
> +static bool cs_etm__packet_has_taken_branch(struct cs_etm_packet *packet)
> +{
> +	if (packet->sample_type == CS_ETM_RANGE &&
> +	    packet->last_instr_taken_branch)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> +				    struct cs_etm_traceid_queue *tidq)
> +{
> +	u64 from, to;
> +	int size;
> +
> +	if (!cs_etm__packet_has_taken_branch(tidq->prev_packet))
> +		return;
> +
> +	if (etmq->etm->synth_opts.last_branch) {
> +		from = cs_etm__last_executed_instr(tidq->prev_packet);
> +		to = cs_etm__first_executed_instr(tidq->packet);
> +
> +		size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> +		/* Enable callchain so thread stack entry can be allocated */
> +		thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags, from, to, size,
> +				    etmq->buffer->buffer_nr + 1, false,
> +				    tidq->br_stack_sz, 0);

[Severity: Medium]
In CPU-wide tracing mode where hardware context IDs are absent, queue->tid
is initialized to -1. machine__findnew_thread() assigns the same dummy
thread (TID=-1) to all trace queues. Because thread_stack__per_cpu() only
handles TID=0 for idle tasks, all CPU trace queues will share the exact
same thread_stack object.

Does this cause thread_stack__event() to push branches from multiple CPUs
into the same shared branch stack ring buffer, corrupting the trace output
by interleaving them?

> +	} else {
> +		thread_stack__set_trace_nr(tidq->frontend_thread,
> +					   tidq->prev_packet->cpu,
> +					   etmq->buffer->buffer_nr + 1);
> +	}
> +}
> +

[ ... ]

> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	cs_etm__copy_insn(etmq, tidq, packet, &sample);
>  
> -	if (etm->synth_opts.last_branch)
> +	if (etm->synth_opts.last_branch) {
> +		thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> +					tidq->last_branch, tidq->br_stack_sz);

[Severity: Critical]
This is a pre-existing issue, but calling thread_stack__br_sample() can
trigger an out-of-bounds write in that function.

In thread_stack__br_sample(), when copying from the wrapped ring buffer, it
calculates bounds using nr but ignores it in memcpy:

	nr = min(ts->br_stack_pos, sz);
	memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);

If the destination buffer size sz is smaller than ts->br_stack_pos, does the
memcpy write beyond the allocated bounds for nr elements?

>  		sample.branch_stack = tidq->last_branch;
> +	}

[ ... ]

> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> +		thread_stack__flush(tidq->frontend_thread);

[Severity: Medium]
In cs_etm__init_traceid_queue(), the return value of
machine__findnew_thread() is not checked, meaning tidq->frontend_thread
could be NULL upon memory allocation failure.

Does thread_stack__flush() dereference tidq->frontend_thread without
validation, potentially leading to a segmentation fault on error paths?

[Severity: Medium]
Looking at thread_stack__flush(), it clears the call stack but does not
reset the branch stack array (ts->br_stack_pos and ts->br_stack_rb remain
intact).

Will this leave stale branch entries from before a trace gap to be prepended
to the new trace segment?

Additionally, thread_stack__flush() iterates and flushes stacks for all
CPUs associated with the thread. For a shared dummy thread (TID=-1) or the
idle thread (TID=0), does this inappropriately flush and destroy in-progress
histories for all other CPUs?

>  
>  	return err;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v8-0-737948584fea@arm.com?part=4

next prev parent reply	other threads:[~2026-06-11 17:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 14:50 [PATCH v8 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11 14:50 ` [PATCH v8 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11 14:50 ` [PATCH v8 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11 14:50 ` [PATCH v8 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11 14:50 ` [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11 17:18   ` sashiko-bot [this message]
2026-06-11 14:50 ` [PATCH v8 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-11 15:06   ` sashiko-bot
2026-06-11 18:00     ` Leo Yan
2026-06-11 18:06     ` Leo Yan
2026-06-11 14:50 ` [PATCH v8 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-11 15:11   ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-11 15:05   ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11 15:06   ` James Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611171846.DC8031F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox