Re: [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Leo Yan" <leo.yan@arm.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries
Date: Thu, 11 Jun 2026 17:18:46 +0000	[thread overview]
Message-ID: <20260611171846.DC8031F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611-b4-arm_cs_callchain_support_v1-v8-4-737948584fea@arm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] Branch history and callchain are corrupted in CPU-wide trace mode due to sharing a single `thread_stack` across all CPUs.
- [Medium] Improper handling of trace discontinuity via `thread_stack__flush()` causes stale branches and cross-CPU corruption.
- [Medium] Missing NULL check on thread allocation leads to segmentation fault in newly added `thread_stack__flush()`.

Pre-existing issues:
- [Critical] Heap buffer overflow (out-of-bounds write) in `thread_stack__br_sample` due to incorrect `memcpy` length.
--

commit 0a266f6d8db7cc2c2667bdbd916921a70696b02b
Author: Leo Yan <leo.yan@arm.com>

perf cs-etm: Use thread-stack for last branch entries

This commit replaces the custom CoreSight ETM last branch circular array
with the common thread_stack helpers. It records taken branches with
thread_stack__event() and synthesizes PERF_SAMPLE_BRANCH_STACK data with
thread_stack__br_sample().

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 4127120459418..8798bf0471faf 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c

[ ... ]

> @@ -1614,6 +1527,42 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
>  		return etm->latest_kernel_timestamp;
>  }
>  
> +static bool cs_etm__packet_has_taken_branch(struct cs_etm_packet *packet)
> +{
> +	if (packet->sample_type == CS_ETM_RANGE &&
> +	    packet->last_instr_taken_branch)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
> +				    struct cs_etm_traceid_queue *tidq)
> +{
> +	u64 from, to;
> +	int size;
> +
> +	if (!cs_etm__packet_has_taken_branch(tidq->prev_packet))
> +		return;
> +
> +	if (etmq->etm->synth_opts.last_branch) {
> +		from = cs_etm__last_executed_instr(tidq->prev_packet);
> +		to = cs_etm__first_executed_instr(tidq->packet);
> +
> +		size = cs_etm__instr_size(etmq, tidq, tidq->prev_packet, from);
> +
> +		/* Enable callchain so thread stack entry can be allocated */
> +		thread_stack__event(tidq->frontend_thread, tidq->prev_packet->cpu,
> +				    tidq->prev_packet->flags, from, to, size,
> +				    etmq->buffer->buffer_nr + 1, false,
> +				    tidq->br_stack_sz, 0);

[Severity: Medium]
In CPU-wide tracing mode where hardware context IDs are absent, queue->tid
is initialized to -1. machine__findnew_thread() assigns the same dummy
thread (TID=-1) to all trace queues. Because thread_stack__per_cpu() only
handles TID=0 for idle tasks, all CPU trace queues will share the exact
same thread_stack object.

Does this cause thread_stack__event() to push branches from multiple CPUs
into the same shared branch stack ring buffer, corrupting the trace output
by interleaving them?

> +	} else {
> +		thread_stack__set_trace_nr(tidq->frontend_thread,
> +					   tidq->prev_packet->cpu,
> +					   etmq->buffer->buffer_nr + 1);
> +	}
> +}
> +

[ ... ]

> @@ -1644,8 +1593,11 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	cs_etm__copy_insn(etmq, tidq, packet, &sample);
>  
> -	if (etm->synth_opts.last_branch)
> +	if (etm->synth_opts.last_branch) {
> +		thread_stack__br_sample(tidq->frontend_thread, tidq->packet->cpu,
> +					tidq->last_branch, tidq->br_stack_sz);

[Severity: Critical]
This is a pre-existing issue, but calling thread_stack__br_sample() can
trigger an out-of-bounds write in that function.

In thread_stack__br_sample(), when copying from the wrapped ring buffer, it
calculates bounds using nr but ignores it in memcpy:

	nr = min(ts->br_stack_pos, sz);
	memcpy(be, &src->entries[0], bsz * ts->br_stack_pos);

If the destination buffer size sz is smaller than ts->br_stack_pos, does the
memcpy write beyond the allocated bounds for nr elements?

>  		sample.branch_stack = tidq->last_branch;
> +	}

[ ... ]

> @@ -2020,7 +1956,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)
> -		cs_etm__reset_last_branch_rb(tidq);
> +		thread_stack__flush(tidq->frontend_thread);

[Severity: Medium]
In cs_etm__init_traceid_queue(), the return value of
machine__findnew_thread() is not checked, meaning tidq->frontend_thread
could be NULL upon memory allocation failure.

Does thread_stack__flush() dereference tidq->frontend_thread without
validation, potentially leading to a segmentation fault on error paths?

[Severity: Medium]
Looking at thread_stack__flush(), it clears the call stack but does not
reset the branch stack array (ts->br_stack_pos and ts->br_stack_rb remain
intact).

Will this leave stale branch entries from before a trace gap to be prepended
to the new trace segment?

Additionally, thread_stack__flush() iterates and flushes stacks for all
CPUs associated with the thread. For a shared dummy thread (TID=-1) or the
idle thread (TID=0), does this inappropriately flush and destroy in-progress
histories for all other CPUs?

>  
>  	return err;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611-b4-arm_cs_callchain_support_v1-v8-0-737948584fea@arm.com?part=4

next prev parent reply	other threads:[~2026-06-11 17:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 14:50 [PATCH v8 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-06-11 14:50 ` [PATCH v8 1/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-06-11 14:50 ` [PATCH v8 2/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-06-11 14:50 ` [PATCH v8 3/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-06-11 14:50 ` [PATCH v8 4/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-06-11 17:18   ` sashiko-bot [this message]
2026-06-11 14:50 ` [PATCH v8 5/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-06-11 15:06   ` sashiko-bot
2026-06-11 18:00     ` Leo Yan
2026-06-11 18:06     ` Leo Yan
2026-06-11 14:50 ` [PATCH v8 6/8] perf cs-etm: Support call indentation Leo Yan
2026-06-11 15:11   ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-06-11 15:05   ` sashiko-bot
2026-06-11 14:50 ` [PATCH v8 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-06-11 15:06   ` James Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611171846.DC8031F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.