Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain
@ 2026-05-26 16:59 Leo Yan
  2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users, Leo Yan

This series adds thread-stack and synthesized callchain support for Arm
CoreSight, which comes from older series [1] but heavily rewritten.

CS ETM previously kept last-branch state in a per-trace-queue buffer.
That effectively makes the state per CPU, while the call/return history
belongs to a thread. This series moves branch tracking to the common
thread-stack code.

The series records CoreSight branches with thread_stack__event(), uses
thread_stack__br_sample() for last branch entries, flushes thread stacks
after decoder resets.

A decoder reset between AUX trace buffers is treated as a global trace
discontinuity, so all thread stacks are flushed, so avoids carrying
stale call/return history across a trace discontinuity.

One limitation remains for instructions emulated by the kernel. In that
case the exception return address may not match the return address
stored in the thread stack, because after exception return can be one
instruction ahead. The stack can still recover when a later return
matches an upper caller. Given emulated instructions are not the common
target for performance callchain analysis. Supporting this would require
extending the common thread-stack path to accept both the real target
address and an adjusted address for stack matching, so this series
leaves that extra complexity out.

The series has been tested on Orion6 board:

  perf test 150 -vvv

  150: Check Arm CoreSight synthesized callchain:
  --- start ---
  test child forked, pid 13528
  Test callchain push: PASS
  Test callchain pop: PASS
  ---- end(0) ----
  150: Check Arm CoreSight synthesized callchain                       : Ok

  perf script --itrace=g16i10il64

  callchain_test   17468 [005] 1031003.229943:         10 instructions:
              aaaac32507c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
              ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
              ffff90bd233c call_init+0x9c (inlined)
              ffff90bd233c __libc_start_main_impl+0x9c (inlined)
              aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)

  callchain_test   17468 [005] 1031003.229943:         10 instructions:
              aaaac3250774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
              ffff90bd233c call_init+0x9c (inlined)
              ffff90bd233c __libc_start_main_impl+0x9c (inlined)
              aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)

  callchain_test   17468 [005] 1031003.229944:         10 instructions:
          ffff800080010c20 vectors+0x420 ([kernel.kallsyms])
              aaaac3250784 do_svc+0x1c (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
              ffff90bd233c call_init+0x9c (inlined)
              ffff90bd233c __libc_start_main_impl+0x9c (inlined)
              aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)

Note, the test fails on Juno board which is caused by many discontinuity
packets (mainly caused by NO_SYNC elem). This is likely caused by the
FIFO overflow on the path.

[1] https://lore.kernel.org/linux-arm-kernel/20200220052701.7754-1-leo.yan@linaro.org/

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
Leo Yan (8):
      perf cs-etm: Decode ETE exception packets
      perf cs-etm: Refactor instruction size handling
      perf cs-etm: Use thread-stack for last branch entries
      perf cs-etm: Flush thread stacks after decoder reset
      perf cs-etm: Support call indentation
      perf cs-etm: Filter synthesized branch samples
      perf cs-etm: Synthesize callchains for instruction samples
      perf test: Add Arm CoreSight callchain test

 .../tests/shell/test_arm_coresight_callchain.sh    | 235 ++++++++++++++++
 tools/perf/util/cs-etm.c                           | 309 ++++++++++++---------
 2 files changed, 408 insertions(+), 136 deletions(-)
---
base-commit: bd2a5be1fe731bc7548205dd148db75f1d588da2
change-id: 20260521-b4-arm_cs_callchain_support_v1-2c2a70719bcc

Best regards,
-- 
Leo Yan <leo.yan@arm.com>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling Leo Yan
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users

ETE shares the same packet format as ETMv4, but exception decoding
handled ETMv4 packets only. As a result, ETE exception packets were
not classified.

Recognize the ETE magic for exception number decoding.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6ec48de29441012f3d827d50616349c6c0d1f037..ab79d08f5a6095448470e2c3ec85ff3db2fb5634 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2138,7 +2138,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
 	 * HVC cases; need to check if it's SVC instruction based on
 	 * packet address.
 	 */
-	if (magic == __perf_cs_etmv4_magic) {
+	if (magic == __perf_cs_etmv4_magic || magic == __perf_cs_ete_magic) {
 		if (packet->exception_number == CS_ETMV4_EXC_CALL &&
 		    cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
 					 prev_packet->end_addr))
@@ -2161,7 +2161,7 @@ static bool cs_etm__is_async_exception(struct cs_etm_traceid_queue *tidq,
 		    packet->exception_number == CS_ETMV3_EXC_FIQ)
 			return true;
 
-	if (magic == __perf_cs_etmv4_magic)
+	if (magic == __perf_cs_etmv4_magic || magic == __perf_cs_ete_magic)
 		if (packet->exception_number == CS_ETMV4_EXC_RESET ||
 		    packet->exception_number == CS_ETMV4_EXC_DEBUG_HALT ||
 		    packet->exception_number == CS_ETMV4_EXC_SYSTEM_ERROR ||
@@ -2192,7 +2192,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
 		    packet->exception_number == CS_ETMV3_EXC_GENERIC)
 			return true;
 
-	if (magic == __perf_cs_etmv4_magic) {
+	if (magic == __perf_cs_etmv4_magic || magic == __perf_cs_ete_magic) {
 		if (packet->exception_number == CS_ETMV4_EXC_TRAP ||
 		    packet->exception_number == CS_ETMV4_EXC_ALIGNMENT ||
 		    packet->exception_number == CS_ETMV4_EXC_INST_FAULT ||

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
  2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users, Leo Yan

From: Leo Yan <leo.yan@linaro.org>

This patch introduces a new function cs_etm__instr_size() to calculate
the instruction size based on ISA type and instruction address.

Given the trace data can be MB and most likely that will be A64/A32 on
a lot of platforms, cs_etm__instr_addr() keeps a single ISA type check
for A64/A32 and executes an optimized calculation (addr + offset * 4).

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index ab79d08f5a6095448470e2c3ec85ff3db2fb5634..5bff8811d61e423463b7bd4e20d599d5b5307a1a 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1347,6 +1347,17 @@ static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
 	return ((instrBytes[1] & 0xF8) >= 0xE8) ? 4 : 2;
 }
 
+static inline int cs_etm__instr_size(struct cs_etm_queue *etmq,
+				     u8 trace_chan_id,
+				     enum cs_etm_isa isa, u64 addr)
+{
+	if (isa == CS_ETM_ISA_T32)
+		return cs_etm__t32_instr_size(etmq, trace_chan_id, addr);
+
+	/* Otherwise, 4-byte instruction size for A32/A64 */
+	return 4;
+}
+
 static inline u64 cs_etm__first_executed_instr(struct cs_etm_packet *packet)
 {
 	/*
@@ -1375,19 +1386,18 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 				     const struct cs_etm_packet *packet,
 				     u64 offset)
 {
-	if (packet->isa == CS_ETM_ISA_T32) {
-		u64 addr = packet->start_addr;
+	u64 addr = packet->start_addr;
 
-		while (offset) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
-			offset--;
-		}
-		return addr;
-	}
+	/* 4-byte instruction size for A32/A64 */
+	if (packet->isa == CS_ETM_ISA_A64 || packet->isa == CS_ETM_ISA_A32)
+		return addr + offset * 4;
 
-	/* Assume a 4 byte instruction size (A32/A64) */
-	return packet->start_addr + offset * 4;
+	while (offset) {
+		addr += cs_etm__instr_size(etmq, trace_chan_id,
+					   packet->isa, addr);
+		offset--;
+	}
+	return addr;
 }
 
 static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
@@ -1540,16 +1550,8 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 		return;
 	}
 
-	/*
-	 * T32 instruction size might be 32-bit or 16-bit, decide by calling
-	 * cs_etm__t32_instr_size().
-	 */
-	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
-							  sample->ip);
-	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
-	else
-		sample->insn_len = 4;
+	sample->insn_len = cs_etm__instr_size(etmq, trace_chan_id,
+					      packet->isa, sample->ip);
 
 	cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len,
 			   (void *)sample->insn, 0);

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
  2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
  2026-05-26 16:59 ` [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users

CS ETM maintains its own circular array for last branch entries, with
local helpers to update, copy and reset the branch stack. This duplicates
logic already provided by the common code.

Record branch with thread_stack__event() and synthesize branch stack
with thread_stack__br_sample(). This removes the local last_branch_rb
buffer and position tracking. Keep the buffer number updated via
thread_stack__set_trace_nr(), which is used when exporting samples to
Python scripts.

The output should remain same, except that be->flags.predicted is no
longer set. Since CoreSight trace does not provide branch prediction
information, clearing the flag avoids confusion.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 152 +++++++++++++----------------------------------
 1 file changed, 41 insertions(+), 111 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 5bff8811d61e423463b7bd4e20d599d5b5307a1a..398ab3b7a429d402cc8e5f6cccb35c0b7c253732 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -83,14 +83,13 @@ struct cs_etm_auxtrace {
 struct cs_etm_traceid_queue {
 	u8 trace_chan_id;
 	u64 period_instructions;
-	size_t last_branch_pos;
 	union perf_event *event_buf;
 	struct thread *thread;
 	struct thread *prev_packet_thread;
 	ocsd_ex_level prev_packet_el;
 	ocsd_ex_level el;
+	unsigned int br_stack_sz;
 	struct branch_stack *last_branch;
-	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
 	struct cs_etm_packet *packet;
 	struct cs_etm_packet_queue packet_queue;
@@ -635,9 +634,8 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 		tidq->last_branch = zalloc(sz);
 		if (!tidq->last_branch)
 			goto out_free;
-		tidq->last_branch_rb = zalloc(sz);
-		if (!tidq->last_branch_rb)
-			goto out_free;
+
+		tidq->br_stack_sz = etm->synth_opts.last_branch_sz;
 	}
 
 	tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
@@ -647,7 +645,6 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	return 0;
 
 out_free:
-	zfree(&tidq->last_branch_rb);
 	zfree(&tidq->last_branch);
 	zfree(&tidq->prev_packet);
 	zfree(&tidq->packet);
@@ -941,7 +938,6 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		thread__zput(tidq->prev_packet_thread);
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
-		zfree(&tidq->last_branch_rb);
 		zfree(&tidq->prev_packet);
 		zfree(&tidq->packet);
 		zfree(&tidq);
@@ -1281,57 +1277,6 @@ static int cs_etm__queue_first_cs_timestamp(struct cs_etm_auxtrace *etm,
 	return ret;
 }
 
-static inline
-void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq,
-				 struct cs_etm_traceid_queue *tidq)
-{
-	struct branch_stack *bs_src = tidq->last_branch_rb;
-	struct branch_stack *bs_dst = tidq->last_branch;
-	size_t nr = 0;
-
-	/*
-	 * Set the number of records before early exit: ->nr is used to
-	 * determine how many branches to copy from ->entries.
-	 */
-	bs_dst->nr = bs_src->nr;
-
-	/*
-	 * Early exit when there is nothing to copy.
-	 */
-	if (!bs_src->nr)
-		return;
-
-	/*
-	 * As bs_src->entries is a circular buffer, we need to copy from it in
-	 * two steps.  First, copy the branches from the most recently inserted
-	 * branch ->last_branch_pos until the end of bs_src->entries buffer.
-	 */
-	nr = etmq->etm->synth_opts.last_branch_sz - tidq->last_branch_pos;
-	memcpy(&bs_dst->entries[0],
-	       &bs_src->entries[tidq->last_branch_pos],
-	       sizeof(struct branch_entry) * nr);
-
-	/*
-	 * If we wrapped around at least once, the branches from the beginning
-	 * of the bs_src->entries buffer and until the ->last_branch_pos element
-	 * are older valid branches: copy them over.  The total number of
-	 * branches copied over will be equal to the number of branches asked by
-	 * the user in last_branch_sz.
-	 */
-	if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
-		memcpy(&bs_dst->entries[nr],
-		       &bs_src->entries[0],
-		       sizeof(struct branch_entry) * tidq->last_branch_pos);
-	}
-}
-
-static inline
-void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq)
-{
-	tidq->last_branch_pos = 0;
-	tidq->last_branch_rb->nr = 0;
-}
-
 static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
 					 u8 trace_chan_id, u64 addr)
 {
@@ -1400,38 +1345,6 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 	return addr;
 }
 
-static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
-					  struct cs_etm_traceid_queue *tidq)
-{
-	struct branch_stack *bs = tidq->last_branch_rb;
-	struct branch_entry *be;
-
-	/*
-	 * The branches are recorded in a circular buffer in reverse
-	 * chronological order: we start recording from the last element of the
-	 * buffer down.  After writing the first element of the stack, move the
-	 * insert position back to the end of the buffer.
-	 */
-	if (!tidq->last_branch_pos)
-		tidq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
-
-	tidq->last_branch_pos -= 1;
-
-	be       = &bs->entries[tidq->last_branch_pos];
-	be->from = cs_etm__last_executed_instr(tidq->prev_packet);
-	be->to	 = cs_etm__first_executed_instr(tidq->packet);
-	/* No support for mispredict */
-	be->flags.mispred = 0;
-	be->flags.predicted = 1;
-
-	/*
-	 * Increment bs->nr until reaching the number of last branches asked by
-	 * the user on the command line.
-	 */
-	if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
-		bs->nr += 1;
-}
-
 static int cs_etm__inject_event(struct cs_etm_auxtrace *etm, union perf_event *event,
 			       struct perf_sample *sample, u64 type)
 {
@@ -1579,6 +1492,37 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
 		return etm->latest_kernel_timestamp;
 }
 
+static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
+				    struct cs_etm_traceid_queue *tidq)
+{
+	u64 from, to;
+	int size;
+
+	if (!tidq->prev_packet->last_instr_taken_branch)
+		return;
+
+	if (tidq->prev_packet->sample_type != CS_ETM_RANGE ||
+	    tidq->packet->sample_type != CS_ETM_RANGE)
+		return;
+
+	if (etmq->etm->synth_opts.last_branch) {
+		from = cs_etm__last_executed_instr(tidq->prev_packet);
+		to = cs_etm__first_executed_instr(tidq->packet);
+
+		size = cs_etm__instr_size(etmq, tidq->trace_chan_id,
+					  tidq->prev_packet->isa, from);
+
+		/* Enable callchain so thread stack entry can be allocated */
+		thread_stack__event(tidq->thread, tidq->prev_packet->cpu,
+				    tidq->prev_packet->flags, from, to, size,
+				    etmq->buffer->buffer_nr + 1, true,
+				    tidq->br_stack_sz, 0);
+	} else {
+		thread_stack__set_trace_nr(tidq->thread, tidq->prev_packet->cpu,
+					   etmq->buffer->buffer_nr + 1);
+	}
+}
+
 static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 					    struct cs_etm_traceid_queue *tidq,
 					    u64 addr, u64 period)
@@ -1608,8 +1552,12 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
-	if (etm->synth_opts.last_branch)
+	if (etm->synth_opts.last_branch) {
+		thread_stack__br_sample(tidq->thread, tidq->packet->cpu,
+					tidq->last_branch,
+					tidq->br_stack_sz);
 		sample.branch_stack = tidq->last_branch;
+	}
 
 	if (etm->synth_opts.inject) {
 		ret = cs_etm__inject_event(etm, event, &sample,
@@ -1798,14 +1746,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 
 	tidq->period_instructions += tidq->packet->instr_count;
 
-	/*
-	 * Record a branch when the last instruction in
-	 * PREV_PACKET is a branch.
-	 */
-	if (etm->synth_opts.last_branch &&
-	    tidq->prev_packet->sample_type == CS_ETM_RANGE &&
-	    tidq->prev_packet->last_instr_taken_branch)
-		cs_etm__update_last_branch_rb(etmq, tidq);
+	cs_etm__add_stack_event(etmq, tidq);
 
 	if (etm->synth_opts.instructions &&
 	    tidq->period_instructions >= etm->instructions_sample_period) {
@@ -1864,10 +1805,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		u64 offset = etm->instructions_sample_period - instrs_prev;
 		u64 addr;
 
-		/* Prepare last branches for instruction sample */
-		if (etm->synth_opts.last_branch)
-			cs_etm__copy_last_branch_rb(etmq, tidq);
-
 		while (tidq->period_instructions >=
 				etm->instructions_sample_period) {
 			/*
@@ -1947,10 +1884,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 	    etmq->etm->synth_opts.instructions &&
 	    tidq->prev_packet->sample_type == CS_ETM_RANGE) {
 		u64 addr;
-
-		/* Prepare last branches for instruction sample */
-		cs_etm__copy_last_branch_rb(etmq, tidq);
-
 		/*
 		 * Generate a last branch event for the branches left in the
 		 * circular buffer at the end of the trace.
@@ -1982,7 +1915,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 
 	/* Reset last branches after flush the trace */
 	if (etm->synth_opts.last_branch)
-		cs_etm__reset_last_branch_rb(tidq);
+		thread_stack__flush(tidq->thread);
 
 	return err;
 }
@@ -2006,9 +1939,6 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
 	    tidq->prev_packet->sample_type == CS_ETM_RANGE) {
 		u64 addr;
 
-		/* Prepare last branches for instruction sample */
-		cs_etm__copy_last_branch_rb(etmq, tidq);
-
 		/*
 		 * Use the address of the end of the last reported execution
 		 * range.

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (2 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 5/8] perf cs-etm: Support call indentation Leo Yan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users

Perf resets the CoreSight decoder when moving to a new AUX trace buffer,
this causes trace discontinunity globally.

For callchain synthesis, keeping thread-stack state after decoder reset
can leave stale call/return history attached to threads that are decoded
later, producing incorrect synthesized callchains.

Flush all host thread stacks after a decoder reset. When virtualization
is present, flush the guest thread stacks as well.

Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 398ab3b7a429d402cc8e5f6cccb35c0b7c253732..ea2424175558ddc0a6f20a9de6c30f377facdc52 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1956,6 +1956,37 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
 
 	return 0;
 }
+
+static int cs_etm__flush_stack_cb(struct thread *thread,
+				  void *data __maybe_unused)
+{
+	thread_stack__flush(thread);
+	return 0;
+}
+
+static void cs_etm__flush_machine_stack(struct cs_etm_queue *etmq, pid_t pid)
+{
+	struct machine *machine;
+
+	machine = machines__find(&etmq->etm->session->machines, pid);
+	if (machine)
+		machine__for_each_thread(machine, cs_etm__flush_stack_cb, NULL);
+}
+
+static void cs_etm__flush_all_stack(struct cs_etm_queue *etmq)
+{
+	enum cs_etm_pid_fmt pid_fmt = cs_etm__get_pid_fmt(etmq);
+
+	if (!etmq->etm->synth_opts.last_branch)
+		return;
+
+	cs_etm__flush_machine_stack(etmq, HOST_KERNEL_ID);
+
+	/* Clear the guest stack if virtualization is supported */
+	if (pid_fmt == CS_ETM_PIDFMT_CTXTID2)
+		cs_etm__flush_machine_stack(etmq, DEFAULT_GUEST_KERNEL_ID);
+}
+
 /*
  * cs_etm__get_data_block: Fetch a block from the auxtrace_buffer queue
  *			   if need be.
@@ -1978,6 +2009,12 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
 		ret = cs_etm_decoder__reset(etmq->decoder);
 		if (ret)
 			return ret;
+
+		/*
+		 * Since the decoder is reset, this causes a global trace
+		 * discontinuity. Flush all thread stacks.
+		 */
+		cs_etm__flush_all_stack(etmq);
 	}
 
 	return etmq->buf_len;

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 5/8] perf cs-etm: Support call indentation
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (3 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples Leo Yan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users, Leo Yan

From: Leo Yan <leo.yan@linaro.org>

This commit supports the field "callindent" to reflect the call stack
depth.

The branch stack is used by both call indentation and the last branch
record, which are separate features. Use a new flag "use_br_stack" to
track whether the branch stack needs to be recorded.

Before:

  perf script -F +callindent

  callchain_test    9187 [002] 599611.826599:          1 branches: main                                 ffff83312258 __libc_start_call_main+0x78 (/usr/lib/aarch64-linux-gnu/libc.so.6)
  callchain_test    9187 [002] 599611.826599:          1 branches: foo                                  aaaae3ed07c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches: print                                aaaae3ed07ac foo+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches: do_svc                               aaaae3ed0794 print+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                      aaaae3ed077c do_svc+0x14 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches: vectors                              aaaae3ed0780 do_svc+0x18 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                  ffff800080010c00 vectors+0x400 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                  ffff800080010c24 vectors+0x424 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                  ffff8000800114dc el0t_64_sync+0xd4 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                  ffff8000800114f8 el0t_64_sync+0xf0 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                  ffff800080011528 el0t_64_sync+0x120 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                  ffff800080011538 el0t_64_sync+0x130 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                  ffff800080011568 el0t_64_sync+0x160 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches: el0t_64_sync_handler             ffff80008001159c el0t_64_sync+0x194 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                  ffff800081829110 el0t_64_sync_handler+0x18 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches: el0t_64_sync_handler             ffff800081829140 el0t_64_sync_handler+0x48 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches: el0_svc                          ffff800081829194 el0t_64_sync_handler+0x9c ([kernel.kallsyms])

After:

  callchain_test    9187 [002] 599611.826599:          1 branches:             main                                                     ffff83312258 __libc_start_call_main+0x78 (/usr/lib/aarch64-linux-gnu/libc.so.6)
  callchain_test    9187 [002] 599611.826599:          1 branches:                 foo                                                  aaaae3ed07c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                     print                                            aaaae3ed07ac foo+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                         do_svc                                       aaaae3ed0794 print+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                                                      aaaae3ed077c do_svc+0x14 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                             vectors                                  aaaae3ed0780 do_svc+0x18 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                                                  ffff800080010c00 vectors+0x400 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080010c24 vectors+0x424 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff8000800114dc el0t_64_sync+0xd4 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff8000800114f8 el0t_64_sync+0xf0 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080011528 el0t_64_sync+0x120 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080011538 el0t_64_sync+0x130 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                                                  ffff800080011568 el0t_64_sync+0x160 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                 el0t_64_sync_handler             ffff80008001159c el0t_64_sync+0x194 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                                                  ffff800081829110 el0t_64_sync_handler+0x18 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                     el0t_64_sync_handler         ffff800081829140 el0t_64_sync_handler+0x48 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                     el0_svc                      ffff800081829194 el0t_64_sync_handler+0x9c ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index ea2424175558ddc0a6f20a9de6c30f377facdc52..b31d0dd46a45dc365edd7c2f9e9b2eb077ca23db 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -66,6 +66,7 @@ struct cs_etm_auxtrace {
 	bool snapshot_mode;
 	bool data_queued;
 	bool has_virtual_ts; /* Virtual/Kernel timestamps in the trace. */
+	bool use_thread_stack;
 
 	int num_cpu;
 	u64 latest_kernel_timestamp;
@@ -626,7 +627,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	if (!tidq->prev_packet)
 		goto out_free;
 
-	if (etm->synth_opts.last_branch) {
+	if (etm->use_thread_stack) {
 		size_t sz = sizeof(struct branch_stack);
 
 		sz += etm->synth_opts.last_branch_sz *
@@ -1505,7 +1506,7 @@ static void cs_etm__add_stack_event(struct cs_etm_queue *etmq,
 	    tidq->packet->sample_type != CS_ETM_RANGE)
 		return;
 
-	if (etmq->etm->synth_opts.last_branch) {
+	if (etmq->etm->use_thread_stack) {
 		from = cs_etm__last_executed_instr(tidq->prev_packet);
 		to = cs_etm__first_executed_instr(tidq->packet);
 
@@ -1914,7 +1915,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 	cs_etm__packet_swap(etm, tidq);
 
 	/* Reset last branches after flush the trace */
-	if (etm->synth_opts.last_branch)
+	if (etm->use_thread_stack)
 		thread_stack__flush(tidq->thread);
 
 	return err;
@@ -1977,7 +1978,7 @@ static void cs_etm__flush_all_stack(struct cs_etm_queue *etmq)
 {
 	enum cs_etm_pid_fmt pid_fmt = cs_etm__get_pid_fmt(etmq);
 
-	if (!etmq->etm->synth_opts.last_branch)
+	if (!etmq->etm->use_thread_stack)
 		return;
 
 	cs_etm__flush_machine_stack(etmq, HOST_KERNEL_ID);
@@ -3438,6 +3439,7 @@ int cs_etm__process_auxtrace_info_full(union perf_event *event,
 		itrace_synth_opts__set_default(&etm->synth_opts,
 				session->itrace_synth_opts->default_no_sample);
 		etm->synth_opts.callchain = false;
+		etm->synth_opts.thread_stack = session->itrace_synth_opts->thread_stack;
 	}
 
 	etm->session = session;
@@ -3489,6 +3491,10 @@ int cs_etm__process_auxtrace_info_full(union perf_event *event,
 		etm->tc.cap_user_time_zero = tc->cap_user_time_zero;
 		etm->tc.cap_user_time_short = tc->cap_user_time_short;
 	}
+
+	etm->use_thread_stack = etm->synth_opts.thread_stack ||
+				etm->synth_opts.last_branch;
+
 	err = cs_etm__synth_events(etm, session);
 	if (err)
 		goto err_free_queues;

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (4 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 5/8] perf cs-etm: Support call indentation Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users, Leo Yan

From: Leo Yan <leo.yan@linaro.org>

CS ETM currently emits branch samples for every decoded branch when
branch synthesis is enabled. This delivers redundant info when users
request only call or return branches.

Add a branch filter derived from the itrace "calls" and "returns" options.
When no filter is set, keep the existing behavior and emit all branch
samples. When call or return filtering is requested, only synthesize branch
samples whose flags match the selected branch types, including trace
begin and end markers.

Before:

  perf script -F +callindent

  callchain_test    9187 [002] 599611.826599:          1 branches:             main                                                     ffff83312258 __libc_start_call_main+0x78 (/usr/lib/aarch64-linux-gnu/libc.so.6)
  callchain_test    9187 [002] 599611.826599:          1 branches:                 foo                                                  aaaae3ed07c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                     print                                            aaaae3ed07ac foo+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                         do_svc                                       aaaae3ed0794 print+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                                                      aaaae3ed077c do_svc+0x14 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                             vectors                                  aaaae3ed0780 do_svc+0x18 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                                                                  ffff800080010c00 vectors+0x400 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080010c24 vectors+0x424 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff8000800114dc el0t_64_sync+0xd4 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff8000800114f8 el0t_64_sync+0xf0 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080011528 el0t_64_sync+0x120 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826600:          1 branches:                                                                  ffff800080011538 el0t_64_sync+0x130 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                                                  ffff800080011568 el0t_64_sync+0x160 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                 el0t_64_sync_handler             ffff80008001159c el0t_64_sync+0x194 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                                                  ffff800081829110 el0t_64_sync_handler+0x18 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                     el0t_64_sync_handler         ffff800081829140 el0t_64_sync_handler+0x48 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                     el0_svc                      ffff800081829194 el0t_64_sync_handler+0x9c ([kernel.kallsyms])

After:

  callchain_test    9187 [002] 599611.826599:          1 branches:             main                                                     ffff83312258 __libc_start_call_main+0x78 (/usr/lib/aarch64-linux-gnu/libc.so.6)
  callchain_test    9187 [002] 599611.826599:          1 branches:                 foo                                                  aaaae3ed07c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                     print                                            aaaae3ed07ac foo+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                         do_svc                                       aaaae3ed0794 print+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826599:          1 branches:                             vectors                                  aaaae3ed0780 do_svc+0x18 (/home/kernel/leoy/test_cs_callchain/callchain_test)
  callchain_test    9187 [002] 599611.826601:          1 branches:                                 el0t_64_sync_handler             ffff80008001159c el0t_64_sync+0x194 ([kernel.kallsyms])
  callchain_test    9187 [002] 599611.826601:          1 branches:                                     el0_svc                      ffff800081829194 el0t_64_sync_handler+0x9c ([kernel.kallsyms])

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b31d0dd46a45dc365edd7c2f9e9b2eb077ca23db..8d98e772ecb307381b5ed1b4bbc4056e8779b261 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -71,6 +71,7 @@ struct cs_etm_auxtrace {
 	int num_cpu;
 	u64 latest_kernel_timestamp;
 	u32 auxtrace_type;
+	u32 branches_filter;
 	u64 branches_sample_type;
 	u64 branches_id;
 	u64 instructions_sample_type;
@@ -1596,6 +1597,10 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	} dummy_bs;
 	u64 ip;
 
+	if (etm->branches_filter &&
+		!(etm->branches_filter & tidq->prev_packet->flags))
+		return 0;
+
 	ip = cs_etm__last_executed_instr(tidq->prev_packet);
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
@@ -3442,6 +3447,16 @@ int cs_etm__process_auxtrace_info_full(union perf_event *event,
 		etm->synth_opts.thread_stack = session->itrace_synth_opts->thread_stack;
 	}
 
+	if (etm->synth_opts.calls)
+		etm->branches_filter |= PERF_IP_FLAG_CALL |
+					PERF_IP_FLAG_TRACE_BEGIN |
+					PERF_IP_FLAG_TRACE_END;
+
+	if (etm->synth_opts.returns)
+		etm->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN |
+					PERF_IP_FLAG_TRACE_END;
+
 	etm->session = session;
 
 	etm->num_cpu = num_cpu;

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (5 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-26 16:59 ` [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
  2026-05-29 14:57 ` [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Arnaldo Carvalho de Melo
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users, Leo Yan

From: Leo Yan <leo.yan@linaro.org>

CS ETM already records branches into the thread stack, but instruction
samples do not carry synthesized callchains. It misses to support the
callchain and no output with the itrace option 'g'.

Allocate a callchain buffer per queue and use thread_stack__sample()
when synthesizing instruction samples. Advertise PERF_SAMPLE_CALLCHAIN
on the synthetic instruction event.

Allocate the callchain stack with one more entry than requested, as the
first entry is reserved for storing context information.

After:

  perf script --itrace=g16l64i100

  callchain_test    9187 [002] 599611.826599:          1 instructions:
              aaaae3ed0774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaae3ed0798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaae3ed07b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              aaaae3ed07c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
              ffff8331225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
              ffff8331233c call_init+0x9c (inlined)
              ffff8331233c __libc_start_main_impl+0x9c (inlined)
              aaaae3ed0670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 tools/perf/util/cs-etm.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8d98e772ecb307381b5ed1b4bbc4056e8779b261..90e0beb910156093d8bd0f320bb0210aca95dd26 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -17,6 +17,7 @@
 #include <stdlib.h>
 
 #include "auxtrace.h"
+#include "callchain.h"
 #include "color.h"
 #include "cs-etm.h"
 #include "cs-etm-decoder/cs-etm-decoder.h"
@@ -85,6 +86,7 @@ struct cs_etm_auxtrace {
 struct cs_etm_traceid_queue {
 	u8 trace_chan_id;
 	u64 period_instructions;
+	u64 kernel_start;
 	union perf_event *event_buf;
 	struct thread *thread;
 	struct thread *prev_packet_thread;
@@ -92,6 +94,7 @@ struct cs_etm_traceid_queue {
 	ocsd_ex_level el;
 	unsigned int br_stack_sz;
 	struct branch_stack *last_branch;
+	struct ip_callchain *callchain;
 	struct cs_etm_packet *prev_packet;
 	struct cs_etm_packet *packet;
 	struct cs_etm_packet_queue packet_queue;
@@ -640,6 +643,16 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 		tidq->br_stack_sz = etm->synth_opts.last_branch_sz;
 	}
 
+	if (etm->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		/* Add 1 to callchain_sz for callchain context */
+		sz += (etm->synth_opts.callchain_sz + 1) * sizeof(u64);
+		tidq->callchain = zalloc(sz);
+		if (!tidq->callchain)
+			goto out_free;
+	}
+
 	tidq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!tidq->event_buf)
 		goto out_free;
@@ -647,6 +660,7 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 	return 0;
 
 out_free:
+	zfree(&tidq->callchain);
 	zfree(&tidq->last_branch);
 	zfree(&tidq->prev_packet);
 	zfree(&tidq->packet);
@@ -939,6 +953,7 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		thread__zput(tidq->thread);
 		thread__zput(tidq->prev_packet_thread);
 		zfree(&tidq->event_buf);
+		zfree(&tidq->callchain);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->prev_packet);
 		zfree(&tidq->packet);
@@ -1431,6 +1446,7 @@ static void cs_etm__set_thread(struct cs_etm_queue *etmq,
 		tidq->thread = machine__idle_thread(machine);
 
 	tidq->el = el;
+	tidq->kernel_start = machine__kernel_start(machine);
 }
 
 int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
@@ -1561,6 +1577,25 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 		sample.branch_stack = tidq->last_branch;
 	}
 
+	if (etm->synth_opts.callchain) {
+		if (tidq->kernel_start)
+			thread_stack__sample(tidq->thread, tidq->packet->cpu,
+					     tidq->callchain,
+					     etm->synth_opts.callchain_sz + 1,
+					     sample.ip, tidq->kernel_start);
+		else
+			/*
+			 * Clear the callchain when the kernel start address is
+			 * not available yet. The empty callchain can then be
+			 * consumed by cs_etm__inject_event().
+			 */
+			memset(tidq->callchain, 0,
+			       sizeof(struct ip_callchain) +
+			       (etm->synth_opts.callchain_sz + 1) * sizeof(u64));
+
+		sample.callchain = tidq->callchain;
+	}
+
 	if (etm->synth_opts.inject) {
 		ret = cs_etm__inject_event(etm, event, &sample,
 					   etm->instructions_sample_type);
@@ -1724,6 +1759,9 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
 	}
 
+	if (etm->synth_opts.callchain)
+		attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+
 	if (etm->synth_opts.instructions) {
 		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
 		attr.sample_period = etm->synth_opts.period;
@@ -3457,6 +3495,14 @@ int cs_etm__process_auxtrace_info_full(union perf_event *event,
 					PERF_IP_FLAG_TRACE_BEGIN |
 					PERF_IP_FLAG_TRACE_END;
 
+	if (etm->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			etm->synth_opts.callchain = false;
+		}
+	}
+
 	etm->session = session;
 
 	etm->num_cpu = num_cpu;
@@ -3508,7 +3554,8 @@ int cs_etm__process_auxtrace_info_full(union perf_event *event,
 	}
 
 	etm->use_thread_stack = etm->synth_opts.thread_stack ||
-				etm->synth_opts.last_branch;
+				etm->synth_opts.last_branch ||
+				etm->synth_opts.callchain;
 
 	err = cs_etm__synth_events(etm, session);
 	if (err)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (6 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
@ 2026-05-26 16:59 ` Leo Yan
  2026-05-29 14:57 ` [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Arnaldo Carvalho de Melo
  8 siblings, 0 replies; 10+ messages in thread
From: Leo Yan @ 2026-05-26 16:59 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, John Garry, Will Deacon, James Clark,
	Mike Leach, Suzuki K Poulose, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Al Grant, Paschalis Mpeis, Amir Ayupov
  Cc: linux-arm-kernel, coresight, linux-perf-users

Add a shell test for synthesized callchains from Arm CoreSight trace.
The test runs only on arm64 systems with cs_etm event and gcc available.

Build a small test program for syscall, record them with CoreSight trace
data, and decode with itrace callchain synthesis enabled. Verify that
the push and pop callchain.

After:

  perf test 150 -vvv

  150: Check Arm CoreSight synthesized callchain:
  --- start ---
  test child forked, pid 13528
  Test callchain push: PASS
  Test callchain pop: PASS
  ---- end(0) ----
  150: Check Arm CoreSight synthesized callchain                       : Ok

Assisted-by: Codex:GPT-5.5
Signed-off-by: Leo Yan <leo.yan@arm.com>
---
 .../tests/shell/test_arm_coresight_callchain.sh    | 235 +++++++++++++++++++++
 1 file changed, 235 insertions(+)

diff --git a/tools/perf/tests/shell/test_arm_coresight_callchain.sh b/tools/perf/tests/shell/test_arm_coresight_callchain.sh
new file mode 100755
index 0000000000000000000000000000000000000000..0e5a5d1129ae7d34f8e0c5942fb62d27db3e862d
--- /dev/null
+++ b/tools/perf/tests/shell/test_arm_coresight_callchain.sh
@@ -0,0 +1,235 @@
+#!/bin/bash
+# Check Arm CoreSight synthesized callchain (exclusive)
+# SPDX-License-Identifier: GPL-2.0
+
+glb_err=1
+
+if ! tmpdir=$(mktemp -d /tmp/perf-cs-callchain-test.XXXXXX); then
+	echo "mktemp failed"
+	exit 1
+fi
+
+cleanup_files()
+{
+	rm -rf "$tmpdir"
+}
+
+trap cleanup_files EXIT
+trap 'cleanup_files; exit $glb_err' TERM INT
+
+skip_if_system_is_not_ready()
+{
+	[ "$(uname -m)" = "aarch64" ] || {
+		echo "Skip: arm64 only test" >&2
+		return 2
+	}
+
+	perf list | grep -q 'cs_etm//' || {
+		echo "Skip: cs_etm event is not available" >&2
+		return 2
+	}
+
+	command -v gcc >/dev/null 2>&1 || {
+		echo "Skip: gcc is not available" >&2
+		return 2
+	}
+
+	return 0
+}
+
+build_test_program()
+{
+	local src=$1
+	local bin=$2
+
+	gcc -g -O0 -o "$bin" "$src"
+}
+
+record_trace()
+{
+	local bin=$1
+	local data=$2
+	local script=$3
+
+	perf record -m ,32M -o "$data" --per-thread -e cs_etm// -- "$bin" >/dev/null 2>&1 &&
+	perf script --itrace=g16i10il64 -i "$data" > "$script"
+}
+
+check_regex()
+{
+	local name=$1
+	local regex=$2
+	local script=$3
+
+	if grep -Pzo "$regex" "$script" >/dev/null; then
+		echo "Test $name: PASS"
+		return 0
+	else
+		echo "Test $name: FAIL"
+		return 1
+	fi
+}
+
+run_test()
+{
+	local name=$1
+	local src=$tmpdir/$name.S
+	local bin=$tmpdir/$name
+	local data=$tmpdir/perf.$name.data
+	local script=$tmpdir/perf.$name.script
+	local regex
+
+	"${name}_src" "$src"
+
+	if ! build_test_program "$src" "$bin"; then
+		echo "$name: build failed"
+		return
+	fi
+
+	if ! record_trace "$bin" "$data" "$script"; then
+		echo "$name: perf record/script failed"
+		return
+	fi
+
+	regex=$("${name}_push_regex")
+	check_regex "${name} push" "$regex" "$script" || return
+
+	regex=$("${name}_pop_regex")
+	check_regex "${name} pop" "$regex" "$script" || return
+
+	glb_err=0
+}
+
+callchain_src()
+{
+	cat > "$1" <<'EOF'
+/* callchain.S */
+	.text
+
+	.global do_svc
+	.type do_svc, %function
+do_svc:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+
+	mov	x0, #1
+	adr	x1, msg
+	mov	x2, #23
+	mov	x8, #64
+
+	nop
+	nop	// Pad nops for 9 insns before svc
+
+	b	1f
+1:
+	svc	#0
+
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop	// Pad nops for 10 insns after svc
+
+	ldp	x29, x30, [sp], #16
+	ret
+	.size do_svc, .-do_svc
+
+	.global foo
+	.type foo, %function
+foo:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop	// Pad nops for 9 insns before call
+
+	bl	do_svc
+
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop
+	nop	// Pad nops for 10 insns after call
+
+	ldp	x29, x30, [sp], #16
+	ret
+	.size foo, .-foo
+
+	.global main
+	.type main, %function
+main:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+
+	bl	foo
+
+	mov	w0, #0
+	ldp	x29, x30, [sp], #16
+	.size main, .-main
+	ret
+
+	.section .rodata
+msg:
+	.asciz	"hello from svc syscall\n"
+EOF
+}
+
+callchain_push_regex()
+{
+	printf '%s' \
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x[[:xdigit:]]+ \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n'\
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'\
+'\n'\
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ do_svc\+0x[[:xdigit:]]+ \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n'\
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'\
+'\n'\
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ (vectors|el.*_64_sync|tramp_vectors)\+0x[[:xdigit:]]+ \(\[kernel\.kallsyms\]\)\n'\
+'[[:space:]]+[[:xdigit:]]+ do_svc\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n'\
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'
+}
+
+callchain_pop_regex()
+{
+	printf '%s' \
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ (ret_to_user|tramp_exit)\+0x[[:xdigit:]]+ \(\[kernel\.kallsyms\]\)\n'\
+'[[:space:]]+[[:xdigit:]]+ do_svc\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n'\
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'\
+'\n'\
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ do_svc\+0x[[:xdigit:]]+ \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x28 \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n' \
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'\
+'\n'\
+'callchain[[:space:]]+[0-9]+ \[[0-9]+\][[:space:]]+10 instructions:[[:space:]]*\n'\
+'[[:space:]]+[[:xdigit:]]+ foo\+0x[[:xdigit:]]+ \(.*/callchain\)\n'\
+'[[:space:]]+[[:xdigit:]]+ main\+0xc \(.*/callchain\)\n'\
+'([[:space:]]+[[:xdigit:]]+ .*\n)*'
+}
+
+skip_if_system_is_not_ready || exit 2
+
+run_test "callchain"
+
+exit $glb_err

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain
  2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
                   ` (7 preceding siblings ...)
  2026-05-26 16:59 ` [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
@ 2026-05-29 14:57 ` Arnaldo Carvalho de Melo
  8 siblings, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2026-05-29 14:57 UTC (permalink / raw)
  To: Leo Yan
  Cc: John Garry, Will Deacon, James Clark, Mike Leach,
	Suzuki K Poulose, Namhyung Kim, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Ian Rogers, Adrian Hunter, Al Grant, Paschalis Mpeis,
	Amir Ayupov, linux-arm-kernel, coresight, linux-perf-users,
	Leo Yan

On Tue, May 26, 2026 at 05:59:36PM +0100, Leo Yan wrote:
> This series adds thread-stack and synthesized callchain support for Arm
> CoreSight, which comes from older series [1] but heavily rewritten.

Hi Leo,

	Please add what changed from v5, v4, etc.

- Arnaldo
 
> CS ETM previously kept last-branch state in a per-trace-queue buffer.
> That effectively makes the state per CPU, while the call/return history
> belongs to a thread. This series moves branch tracking to the common
> thread-stack code.
> 
> The series records CoreSight branches with thread_stack__event(), uses
> thread_stack__br_sample() for last branch entries, flushes thread stacks
> after decoder resets.
> 
> A decoder reset between AUX trace buffers is treated as a global trace
> discontinuity, so all thread stacks are flushed, so avoids carrying
> stale call/return history across a trace discontinuity.
> 
> One limitation remains for instructions emulated by the kernel. In that
> case the exception return address may not match the return address
> stored in the thread stack, because after exception return can be one
> instruction ahead. The stack can still recover when a later return
> matches an upper caller. Given emulated instructions are not the common
> target for performance callchain analysis. Supporting this would require
> extending the common thread-stack path to accept both the real target
> address and an adjusted address for stack matching, so this series
> leaves that extra complexity out.
> 
> The series has been tested on Orion6 board:
> 
>   perf test 150 -vvv
> 
>   150: Check Arm CoreSight synthesized callchain:
>   --- start ---
>   test child forked, pid 13528
>   Test callchain push: PASS
>   Test callchain pop: PASS
>   ---- end(0) ----
>   150: Check Arm CoreSight synthesized callchain                       : Ok
> 
>   perf script --itrace=g16i10il64
> 
>   callchain_test   17468 [005] 1031003.229943:         10 instructions:
>               aaaac32507c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
>   callchain_test   17468 [005] 1031003.229943:         10 instructions:
>               aaaac3250774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
>   callchain_test   17468 [005] 1031003.229944:         10 instructions:
>           ffff800080010c20 vectors+0x420 ([kernel.kallsyms])
>               aaaac3250784 do_svc+0x1c (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
>               ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
>               ffff90bd233c call_init+0x9c (inlined)
>               ffff90bd233c __libc_start_main_impl+0x9c (inlined)
>               aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
> 
> Note, the test fails on Juno board which is caused by many discontinuity
> packets (mainly caused by NO_SYNC elem). This is likely caused by the
> FIFO overflow on the path.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20200220052701.7754-1-leo.yan@linaro.org/
> 
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> ---
> Leo Yan (8):
>       perf cs-etm: Decode ETE exception packets
>       perf cs-etm: Refactor instruction size handling
>       perf cs-etm: Use thread-stack for last branch entries
>       perf cs-etm: Flush thread stacks after decoder reset
>       perf cs-etm: Support call indentation
>       perf cs-etm: Filter synthesized branch samples
>       perf cs-etm: Synthesize callchains for instruction samples
>       perf test: Add Arm CoreSight callchain test
> 
>  .../tests/shell/test_arm_coresight_callchain.sh    | 235 ++++++++++++++++
>  tools/perf/util/cs-etm.c                           | 309 ++++++++++++---------
>  2 files changed, 408 insertions(+), 136 deletions(-)
> ---
> base-commit: bd2a5be1fe731bc7548205dd148db75f1d588da2
> change-id: 20260521-b4-arm_cs_callchain_support_v1-2c2a70719bcc
> 
> Best regards,
> -- 
> Leo Yan <leo.yan@arm.com>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-29 14:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 16:59 [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Leo Yan
2026-05-26 16:59 ` [PATCH v6 1/8] perf cs-etm: Decode ETE exception packets Leo Yan
2026-05-26 16:59 ` [PATCH v6 2/8] perf cs-etm: Refactor instruction size handling Leo Yan
2026-05-26 16:59 ` [PATCH v6 3/8] perf cs-etm: Use thread-stack for last branch entries Leo Yan
2026-05-26 16:59 ` [PATCH v6 4/8] perf cs-etm: Flush thread stacks after decoder reset Leo Yan
2026-05-26 16:59 ` [PATCH v6 5/8] perf cs-etm: Support call indentation Leo Yan
2026-05-26 16:59 ` [PATCH v6 6/8] perf cs-etm: Filter synthesized branch samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 7/8] perf cs-etm: Synthesize callchains for instruction samples Leo Yan
2026-05-26 16:59 ` [PATCH v6 8/8] perf test: Add Arm CoreSight callchain test Leo Yan
2026-05-29 14:57 ` [PATCH v6 0/8] perf cs-etm: Support thread stack and callchain Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox