[PATCH v2 00/18] perf cs-etm: Queue context packets for frontend

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend
@ 2026-06-02 14:26 James Clark
  2026-06-02 14:26 ` [PATCH v2 01/18] " James Clark
                   ` (17 more replies)
  0 siblings, 18 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Fix thread tracking when decoding Coresight trace and add a new test for
it.

The new test is added as a Perf test workload instead of a custom binary
with its own build system, but this requires a new feature in Perf test
to pass in control pipes which can enable and disable events. This
scopes the recording to just the workload and helps to reduce the amount
of data recorded in tracing tests.

With this new feature we can re-write all of the Coresight tests to make
use of it and remove the remaining binaries which fixes the following
issues:

 * They didn't work in out of source builds
 * A lot of the tests unnecessarily required root and didn't skip
   without it
 * They were mainly qualitative tests which didn't look for specific
   behavior

Most importantly, the long build and runtime has been reduced. On a
Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
than the entire Perf build. Now the build time is negligible and the
before and after test runtimes for all the Coresight tests are:

          |   N1SDP   |   Orion O6
  -----------------------------------
  Before  |   4m  0s  |    14m 49s
  After   |      26s  |        56s
  -----------------------------------

Signed-off-by: James Clark <james.clark@linaro.org>
---
Changes in v2:
- Add --workload-ctl option to Perf test
- Re-write all the Coresight tests and speed them up
- Pass packet to memory access function so frontend can use either the
  previous or current packet's EL
- Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd602e18287@linaro.org

---
James Clark (18):
      perf cs-etm: Queue context packets for frontend
      perf test: Add workload-ctl option
      perf test: Add a workload that forces context switches
      perf test cs-etm: Test process attribution
      perf test: Add deterministic workload
      perf test cs-etm: Replace unroll loop thread with deterministic decode test
      perf test cs-etm: Remove asm_pure_loop test
      perf test cs-etm: Replace memcpy test with raw dump stress test
      perf test: Add named_threads workload
      perf test cs-etm: Test decoding for concurrent threads test
      perf test cs-etm: Remove duplicate branch tests
      perf test cs-etm: Reduce snapshot size
      perf test cs-etm: Speed up basic test
      perf test cs-etm: Remove unused Coresight workloads
      perf test cs-etm: Make disassembly test use kcore
      perf test cs-etm: Add all branch instructions to test
      perf test cs-etm: Speed up disassembly test
      perf test cs-etm: Move existing tests to coresight folder

 Documentation/trace/coresight/coresight-perf.rst   |  78 +------
 MAINTAINERS                                        |   2 -
 tools/perf/Documentation/perf-test.txt             |  18 +-
 tools/perf/Makefile.perf                           |  14 +-
 tools/perf/scripts/python/arm-cs-trace-disasm.py   |  20 +-
 tools/perf/tests/builtin-test.c                    | 187 ++++++++++++++++-
 tools/perf/tests/shell/coresight/Makefile          |  29 ---
 .../perf/tests/shell/coresight/Makefile.miniconfig |  14 --
 tools/perf/tests/shell/coresight/asm_pure_loop.sh  |  22 --
 .../tests/shell/coresight/asm_pure_loop/.gitignore |   1 -
 .../tests/shell/coresight/asm_pure_loop/Makefile   |  34 ---
 .../shell/coresight/asm_pure_loop/asm_pure_loop.S  |  30 ---
 .../tests/shell/coresight/concurrent_threads.sh    |  45 ++++
 .../tests/shell/coresight/context_switch_thread.sh |  69 ++++++
 tools/perf/tests/shell/coresight/deterministic.sh  |  71 +++++++
 .../tests/shell/coresight/memcpy_thread/.gitignore |   1 -
 .../tests/shell/coresight/memcpy_thread/Makefile   |  33 ---
 .../shell/coresight/memcpy_thread/memcpy_thread.c  |  80 -------
 .../tests/shell/coresight/memcpy_thread_16k_10.sh  |  22 --
 .../perf/tests/shell/coresight/raw_dump_stress.sh  |  54 +++++
 .../shell/{ => coresight}/test_arm_coresight.sh    |  43 ++--
 .../{ => coresight}/test_arm_coresight_disasm.sh   |  17 +-
 .../tests/shell/coresight/thread_loop/.gitignore   |   1 -
 .../tests/shell/coresight/thread_loop/Makefile     |  33 ---
 .../shell/coresight/thread_loop/thread_loop.c      |  85 --------
 .../shell/coresight/thread_loop_check_tid_10.sh    |  23 --
 .../shell/coresight/thread_loop_check_tid_2.sh     |  23 --
 .../shell/coresight/unroll_loop_thread/.gitignore  |   1 -
 .../shell/coresight/unroll_loop_thread/Makefile    |  33 ---
 .../unroll_loop_thread/unroll_loop_thread.c        |  75 -------
 .../tests/shell/coresight/unroll_loop_thread_10.sh |  22 --
 tools/perf/tests/shell/lib/coresight.sh            | 134 ------------
 tools/perf/tests/tests.h                           |   3 +
 tools/perf/tests/workloads/Build                   |   4 +
 tools/perf/tests/workloads/context_switch_loop.c   |  95 +++++++++
 tools/perf/tests/workloads/deterministic.c         |  39 ++++
 tools/perf/tests/workloads/named_threads.c         | 109 ++++++++++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  21 +-
 tools/perf/util/cs-etm.c                           | 233 +++++++++++++--------
 tools/perf/util/cs-etm.h                           |   8 +-
 40 files changed, 892 insertions(+), 934 deletions(-)
---
base-commit: 5f0ca6b80b12bab1ce06839cdffb6148bb650ff4
change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed

Best regards,
-- 
James Clark <james.clark@linaro.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/18] perf cs-etm: Queue context packets for frontend
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:43   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 02/18] perf test: Add workload-ctl option James Clark
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

PE_CONTEXT elements update the context ID and exception level, but the
decoder may still have prior packets cached for frontend processing.
Updating the context immediately in the decoder backend can make those
cached packets get consumed with the wrong thread or EL state.

Add a CS_ETM_CONTEXT packet carrying the TID and EL to the frontend,
this keeps context changes ordered with the rest of the packet stream
and avoids mismatches when synthesizing samples from cached packets.

Separate the memory access function into one for the frontend and one
for decoding. The frontend also needs memory access to attach the
instruction to samples. Because the frontend does memory access for
both previous and current packets, change all the frontend memory access
function signatures to take both a tidq and packet. But backend always
uses the current backend EL and thread from the tidq.

Treat context packets as a boundary for branch sample generation and
remove tidq->prev_packet_thread because it's not possible to branch to a
different thread, so only tracking the current thread is required for
sample generation.

Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Reported-by: Amir Ayupov <aaupov@meta.com>
Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/
Co-authored-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  21 ++-
 tools/perf/util/cs-etm.c                        | 233 +++++++++++++++---------
 tools/perf/util/cs-etm.h                        |   8 +-
 3 files changed, 165 insertions(+), 97 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index dee3020ceaa9..26940f1f1b0b 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -402,6 +402,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
 	packet_queue->packet_buffer[et].flags = 0;
 	packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
 	packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
+	packet_queue->packet_buffer[et].el = ocsd_EL_unknown;
+	packet_queue->packet_buffer[et].tid = -1;
 
 	if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -449,6 +451,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
 	packet->last_instr_type = elem->last_i_type;
 	packet->last_instr_subtype = elem->last_i_subtype;
 	packet->last_instr_cond = elem->last_instr_cond;
+	packet->el = elem->context.exception_level;
 
 	if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT)
 		packet->last_instr_taken_branch = elem->last_instr_exec;
@@ -525,7 +528,9 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 			const ocsd_generic_trace_elem *elem,
 			const uint8_t trace_chan_id)
 {
+	struct cs_etm_packet *packet;
 	pid_t tid = -1;
+	int ret;
 
 	/*
 	 * Process the PE_CONTEXT packets if we have a valid contextID or VMID.
@@ -546,12 +551,18 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 		break;
 	}
 
-	if (cs_etm__etmq_set_tid_el(etmq, tid, trace_chan_id,
-				    elem->context.exception_level))
+	if (cs_etm__etmq_update_decode_context(etmq, trace_chan_id,
+				elem->context.exception_level, tid))
 		return OCSD_RESP_FATAL_SYS_ERR;
 
-	if (tid == -1)
-		return OCSD_RESP_CONT;
+	ret = cs_etm_decoder__buffer_packet(etmq, packet_queue, trace_chan_id,
+					    CS_ETM_CONTEXT);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &packet_queue->packet_buffer[packet_queue->tail];
+	packet->tid = tid;
+	packet->el = elem->context.exception_level;
 
 	/*
 	 * A timestamp is generated after a PE_CONTEXT element so make sure
@@ -559,7 +570,7 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 	 */
 	cs_etm_decoder__reset_timestamp(packet_queue);
 
-	return OCSD_RESP_CONT;
+	return ret;
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 40c6ddfa8c8d..524e2a6feba8 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -85,15 +85,22 @@ struct cs_etm_traceid_queue {
 	u64 period_instructions;
 	size_t last_branch_pos;
 	union perf_event *event_buf;
-	struct thread *thread;
-	struct thread *prev_packet_thread;
-	ocsd_ex_level prev_packet_el;
-	ocsd_ex_level el;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
 	struct cs_etm_packet *packet;
 	struct cs_etm_packet_queue packet_queue;
+
+	struct thread *decode_thread;
+	ocsd_ex_level decode_el;
+
+	/*
+	 * The frontend accesses the EL from '[prev_]packet' because it needs
+	 * previous EL for branch and current EL for instruction samples. It's
+	 * not possible to change thread in a single branch sample so no need to
+	 * store or access the thread through the packet.
+	 */
+	struct thread *frontend_thread;
 };
 
 enum cs_etm_format {
@@ -614,10 +621,12 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 
 	queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
 	tidq->trace_chan_id = trace_chan_id;
-	tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
-	tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
+	tidq->decode_el = ocsd_EL_unknown;
+	tidq->frontend_thread = machine__findnew_thread(&etm->session->machines.host, -1,
+					       queue->tid);
+	tidq->decode_thread = machine__findnew_thread(&etm->session->machines.host, -1,
 					       queue->tid);
-	tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
+
 
 	tidq->packet = zalloc(sizeof(struct cs_etm_packet));
 	if (!tidq->packet)
@@ -751,20 +760,16 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
 		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 		 * the next incoming packet.
 		 *
-		 * Threads and exception levels are also tracked for both the
-		 * previous and current packets. This is because the previous
-		 * packet is used for the 'from' IP for branch samples, so the
-		 * thread at that time must also be assigned to that sample.
-		 * Across discontinuity packets the thread can change, so by
-		 * tracking the thread for the previous packet the branch sample
-		 * will have the correct info.
+		 * Track Exception levels for both the previous and current
+		 * packets. This is because the previous packet's address is
+		 * used for the 'from' IP for branch samples, so the previous EL
+		 * must also be used so that sample shows it originates from the
+		 * correct EL. Branches can't branch to a different thread, so
+		 * no need to track the previous thread.
 		 */
 		tmp = tidq->packet;
 		tidq->packet = tidq->prev_packet;
 		tidq->prev_packet = tmp;
-		tidq->prev_packet_el = tidq->el;
-		thread__put(tidq->prev_packet_thread);
-		tidq->prev_packet_thread = thread__get(tidq->thread);
 	}
 }
 
@@ -937,8 +942,8 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 
 		/* Free this traceid_queue from the array */
 		tidq = etmq->traceid_queues[idx];
-		thread__zput(tidq->thread);
-		thread__zput(tidq->prev_packet_thread);
+		thread__zput(tidq->frontend_thread);
+		thread__zput(tidq->decode_thread);
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->last_branch_rb);
@@ -1083,47 +1088,44 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address,
 	}
 }
 
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
-			      u64 address, size_t size, u8 *buffer,
-			      const ocsd_mem_space_acc_t mem_space)
+static u32 __cs_etm__mem_access(struct cs_etm_queue *etmq,
+				u64 address, size_t size, u8 *buffer,
+				const ocsd_mem_space_acc_t mem_space,
+				ocsd_ex_level el, struct thread *thread)
 {
 	u8  cpumode;
 	u64 offset;
 	int len;
 	struct addr_location al;
 	struct dso *dso;
-	struct cs_etm_traceid_queue *tidq;
 	int ret = 0;
 
 	if (!etmq)
 		return 0;
 
 	addr_location__init(&al);
-	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
-	if (!tidq)
-		goto out;
 
 	/*
-	 * We've already tracked EL along side the PID in cs_etm__set_thread()
-	 * so double check that it matches what OpenCSD thinks as well. It
-	 * doesn't distinguish between EL0 and EL1 for this mem access callback
-	 * so we had to do the extra tracking. Skip validation if it's any of
-	 * the 'any' values.
+	 * We've already tracked EL along side the PID in
+	 * cs_etm__etmq_set_[decode/frontend]_tid_el() so double check that it
+	 * matches what OpenCSD thinks as well. OpenCSD doesn't distinguish
+	 * between EL0 and EL1 for this mem access callback so we had to do the
+	 * extra tracking. Skip validation if it's any of the 'any' values.
 	 */
 	if (!(mem_space == OCSD_MEM_SPACE_ANY ||
 	      mem_space == OCSD_MEM_SPACE_N || mem_space == OCSD_MEM_SPACE_S)) {
 		if (mem_space & OCSD_MEM_SPACE_EL1N) {
 			/* Includes both non secure EL1 and EL0 */
-			assert(tidq->el == ocsd_EL1 || tidq->el == ocsd_EL0);
+			assert(el == ocsd_EL1 || el == ocsd_EL0);
 		} else if (mem_space & OCSD_MEM_SPACE_EL2)
-			assert(tidq->el == ocsd_EL2);
+			assert(el == ocsd_EL2);
 		else if (mem_space & OCSD_MEM_SPACE_EL3)
-			assert(tidq->el == ocsd_EL3);
+			assert(el == ocsd_EL3);
 	}
 
-	cpumode = cs_etm__cpu_mode(etmq, address, tidq->el);
+	cpumode = cs_etm__cpu_mode(etmq, address, el);
 
-	if (!thread__find_map(tidq->thread, cpumode, address, &al))
+	if (!thread__find_map(thread, cpumode, address, &al))
 		goto out;
 
 	dso = map__dso(al.map);
@@ -1138,7 +1140,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
 
 	map__load(al.map);
 
-	len = dso__data_read_offset(dso, maps__machine(thread__maps(tidq->thread)),
+	len = dso__data_read_offset(dso, maps__machine(thread__maps(thread)),
 				    offset, buffer, size);
 
 	if (len <= 0) {
@@ -1158,6 +1160,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
 	return ret;
 }
 
+static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq,
+				       struct cs_etm_traceid_queue *tidq,
+				       struct cs_etm_packet *packet,
+				       u64 address, size_t size, u8 *buffer)
+{
+	return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el,
+				    tidq->frontend_thread);
+}
+
+static u32 cs_etm__decoder_mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
+				      u64 address, size_t size, u8 *buffer,
+				      const ocsd_mem_space_acc_t mem_space)
+{
+	struct cs_etm_traceid_queue *tidq;
+
+	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
+	if (!tidq)
+		return 0;
+
+	return __cs_etm__mem_access(etmq, address, size, buffer,
+				    mem_space, tidq->decode_el,
+				    tidq->decode_thread);
+}
+
 static struct cs_etm_queue *cs_etm__alloc_queue(void)
 {
 	struct cs_etm_queue *etmq = zalloc(sizeof(*etmq));
@@ -1333,12 +1359,13 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq)
 }
 
 static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
-					 u8 trace_chan_id, u64 addr)
+					 struct cs_etm_traceid_queue *tidq,
+					 struct cs_etm_packet *packet, u64 addr)
 {
 	u8 instrBytes[2];
 
-	cs_etm__mem_access(etmq, trace_chan_id, addr, ARRAY_SIZE(instrBytes),
-			   instrBytes, 0);
+	cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+				    ARRAY_SIZE(instrBytes), instrBytes);
 	/*
 	 * T32 instruction size is indicated by bits[15:11] of the first
 	 * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
@@ -1371,16 +1398,16 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 }
 
 static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
-				     u64 trace_chan_id,
-				     const struct cs_etm_packet *packet,
+				     struct cs_etm_traceid_queue *tidq,
+				     struct cs_etm_packet *packet,
 				     u64 offset)
 {
 	if (packet->isa == CS_ETM_ISA_T32) {
 		u64 addr = packet->start_addr;
 
 		while (offset) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
+			addr += cs_etm__t32_instr_size(etmq, tidq, packet,
+						       addr);
 			offset--;
 		}
 		return addr;
@@ -1490,34 +1517,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
 	return etmq->buf_len;
 }
 
-static void cs_etm__set_thread(struct cs_etm_queue *etmq,
-			       struct cs_etm_traceid_queue *tidq, pid_t tid,
-			       ocsd_ex_level el)
+/*
+ * Convert a raw thread number to a thread struct and assign it to **thread.
+ */
+static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq,
+				      ocsd_ex_level el, pid_t tid,
+				      struct thread **thread)
 {
 	struct machine *machine = cs_etm__get_machine(etmq, el);
 
+	if (!machine || !*thread)
+		return -EINVAL;
+
 	if (tid != -1) {
-		thread__zput(tidq->thread);
-		tidq->thread = machine__find_thread(machine, -1, tid);
+		thread__zput(*thread);
+		*thread = machine__find_thread(machine, -1, tid);
 	}
 
 	/* Couldn't find a known thread */
-	if (!tidq->thread)
-		tidq->thread = machine__idle_thread(machine);
+	if (!*thread)
+		*thread = machine__idle_thread(machine);
 
-	tidq->el = el;
+	return 0;
 }
 
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
-			    u8 trace_chan_id, ocsd_ex_level el)
+/*
+ * Set the thread and EL of the decode context which is ahead in time of the
+ * frontend context.
+ */
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+				       u8 trace_chan_id,
+				       ocsd_ex_level el, pid_t tid)
 {
 	struct cs_etm_traceid_queue *tidq;
+	int ret;
 
 	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
 	if (!tidq)
 		return -EINVAL;
 
-	cs_etm__set_thread(etmq, tidq, tid, el);
+	ret = cs_etm__etmq_update_thread(etmq, el, tid,
+					 &tidq->decode_thread);
+	if (ret)
+		return ret;
+
+	tidq->decode_el = el;
 	return 0;
 }
 
@@ -1527,8 +1571,8 @@ bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq)
 }
 
 static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
-			      u64 trace_chan_id,
-			      const struct cs_etm_packet *packet,
+			      struct cs_etm_traceid_queue *tidq,
+			      struct cs_etm_packet *packet,
 			      struct perf_sample *sample)
 {
 	/*
@@ -1545,14 +1589,14 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 	 * cs_etm__t32_instr_size().
 	 */
 	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
+		sample->insn_len = cs_etm__t32_instr_size(etmq, tidq, packet,
 							  sample->ip);
 	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
 	else
 		sample->insn_len = 4;
 
-	cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len,
-			   (void *)sample->insn, 0);
+	cs_etm__frontend_mem_access(etmq, tidq, packet, sample->ip,
+				    sample->insn_len, (void *)sample->insn);
 }
 
 u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp)
@@ -1588,15 +1632,15 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	perf_sample__init(&sample, /*all=*/true);
 	event->sample.header.type = PERF_RECORD_SAMPLE;
-	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
+	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = addr;
-	sample.pid = thread__pid(tidq->thread);
-	sample.tid = thread__tid(tidq->thread);
+	sample.pid = thread__pid(tidq->frontend_thread);
+	sample.tid = thread__tid(tidq->frontend_thread);
 	sample.id = etmq->etm->instructions_id;
 	sample.stream_id = etmq->etm->instructions_id;
 	sample.period = period;
@@ -1604,7 +1648,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+	cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
 
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
@@ -1649,15 +1693,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
-						     tidq->prev_packet_el);
+						     tidq->prev_packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = ip;
-	sample.pid = thread__pid(tidq->prev_packet_thread);
-	sample.tid = thread__tid(tidq->prev_packet_thread);
+	sample.pid = thread__pid(tidq->frontend_thread);
+	sample.tid = thread__tid(tidq->frontend_thread);
 	sample.addr = cs_etm__first_executed_instr(tidq->packet);
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
@@ -1666,8 +1710,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet,
-			  &sample);
+	cs_etm__copy_insn(etmq, tidq, tidq->prev_packet, &sample);
 
 	/*
 	 * perf report cannot handle events without a branch stack
@@ -1788,7 +1831,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	int ret;
-	u8 trace_chan_id = tidq->trace_chan_id;
 	u64 instrs_prev;
 
 	/* Get instructions remainder from previous packet */
@@ -1874,8 +1916,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * been executed, but PC has not advanced to next
 			 * instruction)
 			 */
-			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+			addr = cs_etm__instr_addr(etmq, tidq, tidq->packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
@@ -2051,9 +2093,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
 	return etmq->buf_len;
 }
 
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
-				 struct cs_etm_packet *packet,
-				 u64 end_addr)
+static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq,
+				 struct cs_etm_traceid_queue *tidq,
+				 struct cs_etm_packet *packet, u64 end_addr)
 {
 	/* Initialise to keep compiler happy */
 	u16 instr16 = 0;
@@ -2075,8 +2117,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * so below only read 2 bytes as instruction size for T32.
 		 */
 		addr = end_addr - 2;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr16),
-				   (u8 *)&instr16, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr16), (u8 *)&instr16);
 		if ((instr16 & 0xFF00) == 0xDF00)
 			return true;
 
@@ -2091,8 +2133,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * +---------+---------+-------------------------+
 		 */
 		addr = end_addr - 4;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
-				   (u8 *)&instr32, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr32), (u8 *)&instr32);
 		if ((instr32 & 0x0F000000) == 0x0F000000 &&
 		    (instr32 & 0xF0000000) != 0xF0000000)
 			return true;
@@ -2108,8 +2150,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * +-----------------------+---------+-----------+
 		 */
 		addr = end_addr - 4;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
-				   (u8 *)&instr32, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr32), (u8 *)&instr32);
 		if ((instr32 & 0xFFE0001F) == 0xd4000001)
 			return true;
 
@@ -2125,7 +2167,6 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
 			       struct cs_etm_traceid_queue *tidq, u64 magic)
 {
-	u8 trace_chan_id = tidq->trace_chan_id;
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
 
@@ -2140,7 +2181,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
 	 */
 	if (magic == __perf_cs_etmv4_magic) {
 		if (packet->exception_number == CS_ETMV4_EXC_CALL &&
-		    cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+		    cs_etm__is_svc_instr(etmq, tidq, prev_packet,
 					 prev_packet->end_addr))
 			return true;
 	}
@@ -2178,7 +2219,6 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
 				      struct cs_etm_traceid_queue *tidq,
 				      u64 magic)
 {
-	u8 trace_chan_id = tidq->trace_chan_id;
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
 
@@ -2204,7 +2244,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
 		 * (SMC, HVC) are taken as sync exceptions.
 		 */
 		if (packet->exception_number == CS_ETMV4_EXC_CALL &&
-		    !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+		    !cs_etm__is_svc_instr(etmq, tidq, prev_packet,
 					  prev_packet->end_addr))
 			return true;
 
@@ -2228,7 +2268,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 {
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
-	u8 trace_chan_id = tidq->trace_chan_id;
 	u64 magic;
 	int ret;
 
@@ -2309,11 +2348,11 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 		if (prev_packet->flags == (PERF_IP_FLAG_BRANCH |
 					   PERF_IP_FLAG_RETURN |
 					   PERF_IP_FLAG_INTERRUPT) &&
-		    cs_etm__is_svc_instr(etmq, trace_chan_id,
-					 packet, packet->start_addr))
+		    cs_etm__is_svc_instr(etmq, tidq, packet, packet->start_addr)) {
 			prev_packet->flags = PERF_IP_FLAG_BRANCH |
 					     PERF_IP_FLAG_RETURN |
 					     PERF_IP_FLAG_SYSCALLRET;
+		}
 		break;
 	case CS_ETM_DISCONTINUITY:
 		/*
@@ -2394,6 +2433,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 					     PERF_IP_FLAG_RETURN |
 					     PERF_IP_FLAG_INTERRUPT;
 		break;
+	case CS_ETM_CONTEXT:
 	case CS_ETM_EMPTY:
 	default:
 		break;
@@ -2469,6 +2509,18 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 			 */
 			cs_etm__sample(etmq, tidq);
 			break;
+		case CS_ETM_CONTEXT:
+			/*
+			 * Update context but don't swap packet. Keep the previous one for branch
+			 * source address info, if tracing the kernel the context will be updated
+			 * between two branches.
+			 */
+			ret = cs_etm__etmq_update_thread(etmq, tidq->packet->el,
+							 tidq->packet->tid,
+							 &tidq->frontend_thread);
+			if (ret)
+				goto out;
+			break;
 		case CS_ETM_EXCEPTION:
 		case CS_ETM_EXCEPTION_RET:
 			/*
@@ -2497,6 +2549,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 		}
 	}
 
+out:
 	return ret;
 }
 
@@ -2620,7 +2673,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
 			if (!tidq)
 				continue;
 
-			if (tid == -1 || thread__tid(tidq->thread) == tid)
+			if (tid == -1 || thread__tid(tidq->frontend_thread) == tid)
 				cs_etm__run_per_thread_timeless_decoder(etmq);
 		} else
 			cs_etm__run_per_cpu_timeless_decoder(etmq);
@@ -3328,7 +3381,7 @@ static int cs_etm__create_queue_decoders(struct cs_etm_queue *etmq)
 	 */
 	if (cs_etm_decoder__add_mem_access_cb(etmq->decoder,
 					      0x0L, ((u64) -1L),
-					      cs_etm__mem_access))
+					      cs_etm__decoder_mem_access))
 		goto out_free_decoder;
 
 	zfree(&t_params);
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index aa9bb4a32eca..b81099c2b301 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -158,6 +158,7 @@ enum cs_etm_sample_type {
 	CS_ETM_DISCONTINUITY,
 	CS_ETM_EXCEPTION,
 	CS_ETM_EXCEPTION_RET,
+	CS_ETM_CONTEXT,
 };
 
 enum cs_etm_isa {
@@ -184,6 +185,8 @@ struct cs_etm_packet {
 	u8 last_instr_size;
 	u8 trace_chan_id;
 	int cpu;
+	int el;
+	pid_t tid;
 };
 
 #define CS_ETM_PACKET_MAX_BUFFER 1024
@@ -259,8 +262,9 @@ enum cs_etm_pid_fmt {
 #include <opencsd/ocsd_if_types.h>
 int cs_etm__get_cpu(struct cs_etm_queue *etmq, u8 trace_chan_id, int *cpu);
 enum cs_etm_pid_fmt cs_etm__get_pid_fmt(struct cs_etm_queue *etmq);
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
-			    u8 trace_chan_id, ocsd_ex_level el);
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+				       u8 trace_chan_id, ocsd_ex_level el,
+				       pid_t tid);
 bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq);
 void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
 					      u8 trace_chan_id);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 01/18] perf cs-etm: Queue context packets for frontend
  2026-06-02 14:26 ` [PATCH v2 01/18] " James Clark
@ 2026-06-02 14:43   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 14:43 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Incorrect CPU mode attribution in `cs_etm__synth_instruction_sample()` during flush due to relying on `tidq->packet->el` instead of the exception level belonging to the instructions being sampled.
--

commit b9c32b9ed06ee8adb88b0b63943c694b4108e629
Author: James Clark <james.clark@linaro.org>

perf cs-etm: Queue context packets for frontend

This commit adds a CS_ETM_CONTEXT packet to carry thread ID and exception
level to the frontend. It separates memory access functions between frontend
and decoder, and removes prev_packet_thread tracking.

Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 40c6ddfa8c8d9..524e2a6feba8e 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c

[ ... ]

> @@ -1588,15 +1632,15 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  
>  	perf_sample__init(&sample, /*all=*/true);
>  	event->sample.header.type = PERF_RECORD_SAMPLE;
> -	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
> +	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->packet->el);

[Severity: Medium]
Does this misattribute guest traces to the host when synthesizing samples
during a flush?

When cs_etm__flush() or cs_etm__end_block() calls
cs_etm__synth_instruction_sample(), it generates samples for the execution
range of tidq->prev_packet. In these scenarios, tidq->packet holds a
CS_ETM_DISCONTINUITY packet (or a stale packet) where the exception level
is ocsd_EL_unknown.

Passing this unknown EL into cs_etm__cpu_mode() causes guest trace execution
from before the discontinuity to be misattributed to the host machine
(PERF_RECORD_MISC_KERNEL or PERF_RECORD_MISC_USER) in the profile statistics.

>  	event->sample.header.size = sizeof(struct perf_event_header);
>  
>  	/* Set time field based on etm auxtrace config. */
>  	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
>  
>  	sample.ip = addr;
> -	sample.pid = thread__pid(tidq->thread);
> -	sample.tid = thread__tid(tidq->thread);

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 02/18] perf test: Add workload-ctl option
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
  2026-06-02 14:26 ` [PATCH v2 01/18] " James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:40   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 03/18] perf test: Add a workload that forces context switches James Clark
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a --workload-ctl=fifo:ctl-fifo[,ack-fifo] option for 'perf test
-w'. When set, run_workload() opens the named FIFO, writes enable before
invoking the builtin workload, writes disable before returning, and
waits for ack responses when an ack FIFO is provided to ensure that the
workload doesn't run until the events are enabled.

This can be used to limit the scope of the recording to only the
workload execution and avoid recording Perf setup and teardown code if
Perf record is started with events disabled (-D 1).

Assisted-by: Codex:GPT-5.5
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt |   6 ++
 tools/perf/tests/builtin-test.c        | 184 ++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 32da0d1fa86a..1faf30d4a7be 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -69,3 +69,9 @@ OPTIONS
 
 --list-workloads::
 	List the available workloads to use with -w/--workload.
+
+--workload-ctl=fifo:ctl-fifo[,ack-fifo]::
+	Write 'enable' to ctl-fifo before running the workload and 'disable'
+	before returning. If ack-fifo is provided, the workload runner waits for
+	an 'ack' response after each command. This scopes the recording to only
+	the workload if used with 'perf record -D 1 --control ...'.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index f2c135891477..d5df3efdce3b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -50,6 +50,7 @@ static bool sequential;
 static unsigned int runs_per_test = 1;
 const char *dso_to_test;
 const char *test_objdump_path = "objdump";
+static const char *workload_control;
 
 /*
  * List of architecture specific tests. Not a weak symbol as the array length is
@@ -161,6 +162,11 @@ static struct test_workload *workloads[] = {
 #endif
 };
 
+struct workload_control {
+	int ctl_fd;
+	int ack_fd;
+};
+
 #define workloads__for_each(workload) \
 	for (unsigned i = 0; i < ARRAY_SIZE(workloads) && ({ workload = workloads[i]; 1; }); i++)
 
@@ -711,13 +717,185 @@ static int workloads__fprintf_list(FILE *fp)
 	return printed;
 }
 
+static int perf_control_open_fifo(struct workload_control *ctl, const char *str)
+{
+	char *s, *p;
+	int ret;
+
+	if (strncmp(str, "fifo:", 5))
+		return -EINVAL;
+
+	str += 5;
+	if (!*str || *str == ',')
+		return -EINVAL;
+
+	s = strdup(str);
+	if (!s)
+		return -ENOMEM;
+
+	p = strchr(s, ',');
+	if (p)
+		*p = '\0';
+
+	ctl->ctl_fd = open(s, O_WRONLY | O_CLOEXEC);
+	if (ctl->ctl_fd < 0) {
+		ret = -errno;
+		pr_err("Failed to open workload control FIFO '%s': %m\n", s);
+		free(s);
+		return ret;
+	}
+
+	if (p && *++p) {
+		ctl->ack_fd = open(p, O_RDONLY | O_CLOEXEC);
+		if (ctl->ack_fd < 0) {
+			ret = -errno;
+			pr_err("Failed to open workload control ack FIFO '%s': %m\n", p);
+			close(ctl->ctl_fd);
+			ctl->ctl_fd = -1;
+			free(s);
+			return ret;
+		}
+	}
+
+	free(s);
+	return 0;
+}
+
+static int perf_control_open(struct workload_control *ctl)
+{
+	int ret;
+
+	if (!workload_control)
+		return 0;
+
+	ret = perf_control_open_fifo(ctl, workload_control);
+
+	if (ret == -EINVAL) {
+		pr_err("Unsupported workload control spec '%s', expected fifo:ctl-fifo[,ack-fifo]\n",
+			workload_control);
+	}
+
+	return ret;
+}
+
+static void perf_control_close(struct workload_control *ctl)
+{
+	if (ctl->ctl_fd >= 0) {
+		close(ctl->ctl_fd);
+		ctl->ctl_fd = -1;
+	}
+	if (ctl->ack_fd >= 0) {
+		close(ctl->ack_fd);
+		ctl->ack_fd = -1;
+	}
+}
+
+static int perf_control_write_cmd(int fd, const char *cmd)
+{
+	size_t len = strlen(cmd);
+	ssize_t ret;
+
+	while (len) {
+		ret = write(fd, cmd, len);
+		if (ret < 0) {
+			if (errno == EINTR)
+				continue;
+			pr_err("Failed to write perf control command '%s': %m\n", cmd);
+			return -1;
+		}
+
+		if (!ret) {
+			pr_err("Failed to write perf control command '%s': short write\n", cmd);
+			return -1;
+		}
+
+		cmd += ret;
+		len -= ret;
+	}
+
+	return 0;
+}
+
+static int perf_control_read_ack(int fd)
+{
+	char buf[16];
+	ssize_t ret;
+
+	do {
+		ret = read(fd, buf, sizeof(buf) - 1);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		pr_err("Failed to read perf control ack: %m\n");
+		return -1;
+	}
+
+	if (!ret) {
+		pr_err("Unexpected EOF while reading perf control ack\n");
+		return -1;
+	}
+
+	buf[ret] = '\0';
+	for (ssize_t i = 0; i < ret; i++) {
+		if (buf[i] == '\n' || buf[i] == '\0') {
+			buf[i] = '\0';
+			break;
+		}
+	}
+
+	if (strcmp(buf, "ack")) {
+		pr_err("Unexpected perf control ack: %s\n", buf);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_control_send(struct workload_control *ctl, const char *cmd)
+{
+	if (ctl->ctl_fd < 0)
+		return 0;
+
+	if (perf_control_write_cmd(ctl->ctl_fd, cmd))
+		return -1;
+
+	if (ctl->ack_fd >= 0 && perf_control_read_ack(ctl->ack_fd))
+		return -1;
+
+	return 0;
+}
+
 static int run_workload(const char *work, int argc, const char **argv)
 {
 	struct test_workload *twl;
 
 	workloads__for_each(twl) {
-		if (!strcmp(twl->name, work))
-			return twl->func(argc, argv);
+		struct workload_control ctl = {
+			.ctl_fd = -1,
+			.ack_fd = -1,
+		};
+		int control_ret, ret;
+
+		if (strcmp(twl->name, work))
+			continue;
+
+		ret = perf_control_open(&ctl);
+		if (ret)
+			return ret;
+
+		if (perf_control_send(&ctl, "enable\n")) {
+			perf_control_close(&ctl);
+			return -1;
+		}
+
+		ret = twl->func(argc, argv);
+
+		control_ret = perf_control_send(&ctl, "disable\n");
+		perf_control_close(&ctl);
+		if (control_ret)
+			return -1;
+
+		return ret;
 	}
 
 	pr_info("No workload found: %s\n", work);
@@ -799,6 +977,8 @@ int cmd_test(int argc, const char **argv)
 	OPT_UINTEGER('r', "runs-per-test", &runs_per_test,
 		     "Run each test the given number of times, default 1"),
 	OPT_STRING('w', "workload", &workload, "work", "workload to run for testing, use '--list-workloads' to list the available ones."),
+	OPT_STRING(0, "workload-ctl", &workload_control, "fifo:ctl-fifo[,ack-fifo]",
+		   "Write enable to the fifo just before running the workload and disable after, with optional ack from ack-fifo"),
 	OPT_BOOLEAN(0, "list-workloads", &list_workloads, "List the available builtin workloads to use with -w/--workload"),
 	OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
 	OPT_STRING(0, "objdump", &test_objdump_path, "path",

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 02/18] perf test: Add workload-ctl option
  2026-06-02 14:26 ` [PATCH v2 02/18] perf test: Add workload-ctl option James Clark
@ 2026-06-02 14:40   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 14:40 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Incorrect command logged in error message on partial write failure.
--

commit 49160052733bc1acf9df33b623a7dd76ae191072
Author: James Clark <james.clark@linaro.org>

perf test: Add workload-ctl option

This commit adds a --workload-ctl option to 'perf test -w', which uses
control and ack FIFOs to coordinate the execution of builtin workloads.
This allows perf record to be scoped strictly to the workload's execution
time by waiting for synchronization.

> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
[ ... ]
> +static int perf_control_write_cmd(int fd, const char *cmd)
> +{
> +	size_t len = strlen(cmd);
> +	ssize_t ret;
> +
> +	while (len) {
> +		ret = write(fd, cmd, len);
> +		if (ret < 0) {
> +			if (errno == EINTR)
> +				continue;
> +			pr_err("Failed to write perf control command '%s': %m\n", cmd);

[Severity: Low]
If a previous loop iteration successfully wrote a portion of the bytes,
cmd is advanced (cmd += ret) at the bottom of the loop. If a subsequent
write fails, will this error message log only the truncated suffix of the
command rather than the full original string?

> +			return -1;
> +		}
> +
> +		if (!ret) {
> +			pr_err("Failed to write perf control command '%s': short write\n", cmd);

[Severity: Low]
Similarly, could this also log an incomplete command if cmd was advanced
during a prior partial write?

> +			return -1;
> +		}
> +
> +		cmd += ret;
> +		len -= ret;
> +	}
> +
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 03/18] perf test: Add a workload that forces context switches
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
  2026-06-02 14:26 ` [PATCH v2 01/18] " James Clark
  2026-06-02 14:26 ` [PATCH v2 02/18] perf test: Add workload-ctl option James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:38   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 04/18] perf test cs-etm: Test process attribution James Clark
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

This workload launches two processes that block when reading and writing
to each other forcing the other process to be scheduled for each
read/write pair.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt           |  7 +-
 tools/perf/tests/builtin-test.c                  |  1 +
 tools/perf/tests/tests.h                         |  1 +
 tools/perf/tests/workloads/Build                 |  1 +
 tools/perf/tests/workloads/context_switch_loop.c | 95 ++++++++++++++++++++++++
 5 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 1faf30d4a7be..9c0d7ac2bc64 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -55,15 +55,16 @@ OPTIONS
 
 -w::
 --workload=::
-	Run a built-in workload, to list them use '--list-workloads', current ones include:
-	noploop, thloop, leafloop, sqrtloop, brstack, datasym and landlock.
+	Run a built-in workload, to list them use '--list-workloads', current
+	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
+	context_switch_loop and landlock.
 
 	Used with the shell script regression tests.
 
 	Some accept an extra parameter:
 
 		seconds: leafloop, noploop, sqrtloop, thloop
-		nrloops: brstack
+		nrloops: brstack, context_switch_loop
 
 	The datasym and landlock workloads don't accept any.
 
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d5df3efdce3b..1057ee836c30 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -156,6 +156,7 @@ static struct test_workload *workloads[] = {
 	&workload__landlock,
 	&workload__traploop,
 	&workload__inlineloop,
+	&workload__context_switch_loop,
 
 #ifdef HAVE_RUST_SUPPORT
 	&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index ee00518bf36f..79f50bacfc94 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -242,6 +242,7 @@ DECLARE_WORKLOAD(datasym);
 DECLARE_WORKLOAD(landlock);
 DECLARE_WORKLOAD(traploop);
 DECLARE_WORKLOAD(inlineloop);
+DECLARE_WORKLOAD(context_switch_loop);
 
 #ifdef HAVE_RUST_SUPPORT
 DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 2ef97f7affce..3bda6da04a35 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -9,6 +9,7 @@ perf-test-y += datasym.o
 perf-test-y += landlock.o
 perf-test-y += traploop.o
 perf-test-y += inlineloop.o
+perf-test-y += context_switch_loop.o
 
 ifeq ($(CONFIG_RUST_SUPPORT),y)
     perf-test-y += code_with_type.o
diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c
new file mode 100644
index 000000000000..73bfcaeff5b0
--- /dev/null
+++ b/tools/perf/tests/workloads/context_switch_loop.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/compiler.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../tests.h"
+
+static int loops = 100;
+static char buf;
+int context_switch_loop_work = 1234;
+
+#define write_block(fd) \
+	do { \
+		if (write(fd, &buf, 1) <= 0) \
+			exit(1); \
+	} while (0)
+
+#define read_block(fd) \
+	do { \
+		if (read(fd, &buf, 1) <= 0) \
+			exit(1); \
+	} while (0)
+
+/* Not static to avoid LTO clobbering the function name */
+void context_switch_loop_proc1(int in_fd, int out_fd);
+noinline void context_switch_loop_proc1(int in_fd, int out_fd)
+{
+	for (int i = 0; i < loops; i++) {
+		read_block(in_fd);
+		context_switch_loop_work += i * 3;
+		write_block(out_fd);
+	}
+}
+
+void context_switch_loop_proc2(int in_fd, int out_fd);
+noinline void context_switch_loop_proc2(int in_fd, int out_fd)
+{
+	for (int i = 0; i < loops; i++) {
+		write_block(out_fd);
+		context_switch_loop_work += i * 7;
+		read_block(in_fd);
+	}
+}
+
+/*
+ * Launches two processes that take turns to execute a multiplication N times
+ */
+static int context_switch_loop(int argc, const char **argv)
+{
+	int a_to_b[2], b_to_a[2];
+	pid_t proc1_pid;
+	int status;
+
+	if (argc > 0) {
+		loops = atoi(argv[0]);
+		if (loops < 0) {
+			fprintf(stderr, "Invalid number of loops: %s\n", argv[0]);
+			return 1;
+		}
+	}
+
+	if (pipe(a_to_b) || pipe(b_to_a)) {
+		perror("Pipe error");
+		return 1;
+	}
+
+	proc1_pid = fork();
+	if (proc1_pid < 0) {
+		perror("Fork error");
+		return 1;
+	}
+
+	if (!proc1_pid) {
+		close(a_to_b[0]);
+		close(b_to_a[1]);
+		prctl(PR_SET_NAME, "proc1", 0, 0, 0);
+		context_switch_loop_proc1(b_to_a[0], a_to_b[1]);
+		exit(0);
+	}
+
+	prctl(PR_SET_NAME, "proc2", 0, 0, 0);
+	context_switch_loop_proc2(a_to_b[0], b_to_a[1]);
+
+	if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) ||
+	    WEXITSTATUS(status))
+		return 1;
+
+	return 0;
+}
+
+DEFINE_WORKLOAD(context_switch_loop);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/18] perf test: Add a workload that forces context switches
  2026-06-02 14:26 ` [PATCH v2 03/18] perf test: Add a workload that forces context switches James Clark
@ 2026-06-02 14:38   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 14:38 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Parent process fails to close unused pipe ends, causing a deadlock if the child exits prematurely, and leaks file descriptors on all paths.
--

commit 4b7337e6446674341b89bdd15dec4b286776529f
Author: James Clark <james.clark@linaro.org>

perf test: Add a workload that forces context switches

This commit adds a new test workload that forces context switches by launching
two processes that block when reading and writing to each other.

> diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c
> new file mode 100644
> index 0000000000000..73bfcaeff5b09
> --- /dev/null
> +++ b/tools/perf/tests/workloads/context_switch_loop.c

[ ... ]

> +static int context_switch_loop(int argc, const char **argv)
> +{
> +	int a_to_b[2], b_to_a[2];
> +	pid_t proc1_pid;
> +	int status;

[ ... ]

> +	if (pipe(a_to_b) || pipe(b_to_a)) {

[Severity: Medium]
If pipe(b_to_a) fails, does this return 1 and leak the file descriptors
successfully allocated by pipe(a_to_b)?

> +		perror("Pipe error");
> +		return 1;
> +	}
> +
> +	proc1_pid = fork();
> +	if (proc1_pid < 0) {
> +		perror("Fork error");
> +		return 1;

[Severity: Medium]
Similarly, does this error path leak all 4 file descriptors allocated by the
pipes?

> +	}
> +
> +	if (!proc1_pid) {
> +		close(a_to_b[0]);
> +		close(b_to_a[1]);
> +		prctl(PR_SET_NAME, "proc1", 0, 0, 0);
> +		context_switch_loop_proc1(b_to_a[0], a_to_b[1]);
> +		exit(0);
> +	}
> +
> +	prctl(PR_SET_NAME, "proc2", 0, 0, 0);
> +	context_switch_loop_proc2(a_to_b[0], b_to_a[1]);

[Severity: Medium]
Does the parent process (proc2) need to close its unused pipe ends (a_to_b[1]
and b_to_a[0]) before executing its workload loop?

If the child process (proc1) terminates unexpectedly or calls exit(1) (e.g.,
due to a read/write error), it closes its file descriptors. However, the parent
waiting in read(a_to_b[0]) will not receive an EOF (0) because it still holds
the write end a_to_b[1] open itself. This could cause the parent to block
indefinitely and hang the test suite.

> +
> +	if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) ||
> +	    WEXITSTATUS(status))
> +		return 1;
> +
> +	return 0;

[Severity: Medium]
Do the waitpid error path and successful return path both exit without closing
the remaining pipe file descriptors?

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 04/18] perf test cs-etm: Test process attribution
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (2 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 03/18] perf test: Add a workload that forces context switches James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 05/18] perf test: Add deterministic workload James Clark
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Run the context switch workload on one CPU and trace it to test that
symbols are attributed to the correct process and that the attribution
changes at the exact point that the context switch happened.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/context_switch_thread.sh | 69 ++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/tools/perf/tests/shell/coresight/context_switch_thread.sh b/tools/perf/tests/shell/coresight/context_switch_thread.sh
new file mode 100755
index 000000000000..0992c35a329d
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/context_switch_thread.sh
@@ -0,0 +1,69 @@
+#!/bin/bash -e
+# Coresight context switch thread attribution (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for "-C 0" in record command
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_context_switch.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+check_samples() {
+	owner_samples=$(grep -c "proc1.*context_switch_loop_proc1" "$tmpdir/script" || true)
+	next_samples=$(grep -c "proc2.*context_switch_loop_proc2" "$tmpdir/script" || true)
+
+	if [ "$owner_samples" -eq 0 ] || [ "$next_samples" -eq 0 ]; then
+		echo "No samples found"
+		cleanup
+		exit 1
+	fi
+
+	if grep "proc2.*context_switch_loop_proc1" "$tmpdir/script"; then
+		echo "Thread1 symbol was attributed to proc2"
+		cleanup
+		exit 1
+	fi
+
+	if grep "proc1.*context_switch_loop_proc2" "$tmpdir/script"; then
+		echo "Thread2 symbol was attributed to proc1"
+		cleanup
+		exit 1
+	fi
+}
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Pin to one CPU so the two threads alternate running but record into the same
+# trace buffer. Start disabled and use the control FIFO to only record the
+# workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -C 0 -D -1 --control fifo:"$cf","$af" -- \
+	taskset --cpu-list 0 perf test --workload-ctl fifo:"$cf","$af" \
+	-w context_switch_loop > /dev/null 2>&1
+
+# Test both instruction and branch sample generation modes.
+perf script -i "$tmpdir/data" --itrace=i4 -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+perf script -i "$tmpdir/data" --itrace=b -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+
+cleanup
+exit 0

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/18] perf test: Add deterministic workload
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (3 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 04/18] perf test cs-etm: Test process attribution James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:49   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 06/18] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a workload that does the same thing every time for testing CPU trace
decoding.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt     |  4 +--
 tools/perf/tests/builtin-test.c            |  1 +
 tools/perf/tests/tests.h                   |  1 +
 tools/perf/tests/workloads/Build           |  2 ++
 tools/perf/tests/workloads/deterministic.c | 39 ++++++++++++++++++++++++++++++
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 9c0d7ac2bc64..7ec70c054cac 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
 --workload=::
 	Run a built-in workload, to list them use '--list-workloads', current
 	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
-	context_switch_loop and landlock.
+	context_switch_loop, deterministic and landlock.
 
 	Used with the shell script regression tests.
 
@@ -66,7 +66,7 @@ OPTIONS
 		seconds: leafloop, noploop, sqrtloop, thloop
 		nrloops: brstack, context_switch_loop
 
-	The datasym and landlock workloads don't accept any.
+	The datasym, landlock and deterministic workloads don't accept any.
 
 --list-workloads::
 	List the available workloads to use with -w/--workload.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 1057ee836c30..3c6d3de98be9 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -157,6 +157,7 @@ static struct test_workload *workloads[] = {
 	&workload__traploop,
 	&workload__inlineloop,
 	&workload__context_switch_loop,
+	&workload__deterministic,
 
 #ifdef HAVE_RUST_SUPPORT
 	&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 79f50bacfc94..f8bba2d68769 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -243,6 +243,7 @@ DECLARE_WORKLOAD(landlock);
 DECLARE_WORKLOAD(traploop);
 DECLARE_WORKLOAD(inlineloop);
 DECLARE_WORKLOAD(context_switch_loop);
+DECLARE_WORKLOAD(deterministic);
 
 #ifdef HAVE_RUST_SUPPORT
 DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 3bda6da04a35..599a37c87878 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -10,6 +10,7 @@ perf-test-y += landlock.o
 perf-test-y += traploop.o
 perf-test-y += inlineloop.o
 perf-test-y += context_switch_loop.o
+perf-test-y += deterministic.o
 
 ifeq ($(CONFIG_RUST_SUPPORT),y)
     perf-test-y += code_with_type.o
@@ -22,3 +23,4 @@ CFLAGS_brstack.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_datasym.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_traploop.o         = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_inlineloop.o       = -g -O2
+CFLAGS_deterministic.o    = -g -O0
diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c
new file mode 100644
index 000000000000..3caea8564043
--- /dev/null
+++ b/tools/perf/tests/workloads/deterministic.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/compiler.h>
+#include "../tests.h"
+
+int dt_work = 1234;
+
+static noinline void function1(void)
+{
+	dt_work *= 7;
+	dt_work *= 7;
+	dt_work *= 7;
+}
+
+static noinline void function2(void)
+{
+	dt_work *= 7;
+	dt_work *= 7;
+	dt_work *= 7;
+}
+
+static int deterministic(int argc __maybe_unused,
+			 const char **argv __maybe_unused)
+{
+	dt_work *= 7;
+	dt_work *= 7;
+	dt_work *= 7;
+
+	function1();
+
+	dt_work *= 7;
+	dt_work *= 7;
+	dt_work *= 7;
+
+	function2();
+
+	return 0;
+}
+
+DEFINE_WORKLOAD(deterministic);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 05/18] perf test: Add deterministic workload
  2026-06-02 14:26 ` [PATCH v2 05/18] perf test: Add deterministic workload James Clark
@ 2026-06-02 14:49   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 14:49 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Medium] Signed integer overflow in dt_work causes Undefined Behavior
- [Medium] Workload state leakage breaks data determinism on consecutive runs
- [Low] Build failure due to missing `-U_FORTIFY_SOURCE` with `-O0`
- [Low] Global namespace pollution from non-static `dt_work`
--

commit cdb1ed35f7dbaf322c0b18289b38a7da8d21039e
Author: James Clark <james.clark@linaro.org>

perf test: Add deterministic workload

This commit adds a new workload for testing CPU trace decoding. It introduces
a deterministic sequence of multiplications and function calls that can be 
used to verify expected behavior across test runs.

[ ... ]

> diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
> index 3bda6da04a350..599a37c87878f 100644
> --- a/tools/perf/tests/workloads/Build
> +++ b/tools/perf/tests/workloads/Build

[ ... ]

> @@ -22,3 +23,4 @@ CFLAGS_brstack.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
>  CFLAGS_datasym.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
>  CFLAGS_traploop.o         = -g -O0 -fno-inline -U_FORTIFY_SOURCE
>  CFLAGS_inlineloop.o       = -g -O2
> +CFLAGS_deterministic.o    = -g -O0

[Severity: Low]
Does this code introduce a build regression on systems with glibc?

Since the perf build system globally defines _FORTIFY_SOURCE=2 and enables
-Werror by default, compiling with -O0 without also passing -U_FORTIFY_SOURCE
can trigger a warning from <features.h> ("_FORTIFY_SOURCE requires compiling
with optimization (-O)"), which is then escalated to an error.

> diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c
> new file mode 100644
> index 0000000000000..3caea8564043c
> --- /dev/null
> +++ b/tools/perf/tests/workloads/deterministic.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/compiler.h>
> +#include "../tests.h"
> +
> +int dt_work = 1234;

[Severity: Low]
Should dt_work be declared static?

It is currently declared at global scope without a unique prefix, which exposes
it to the entire perf tool binary.

[Severity: Medium]
Will dt_work experience undefined behavior due to signed integer overflow?

The variable is initialized to 1234 and is unconditionally multiplied by 7
twelve times during the execution of deterministic(). The 8th multiplication
results in 7,113,764,434, which exceeds the maximum value of a 32-bit signed
integer.

Could this cause crashes if the perf tool is built with undefined behavior
sanitizer? Perhaps dt_work could be unsigned int or unsigned long long
instead.

> +
> +static noinline void function1(void)
> +{
> +	dt_work *= 7;

[ ... ]

> +static int deterministic(int argc __maybe_unused,
> +			 const char **argv __maybe_unused)
> +{

[Severity: Medium]
Does this code break data determinism on consecutive test runs?

If the workload is executed multiple times within the same process (for example,
using perf test -w deterministic -r N), dt_work will start from the final
overflowed value of the previous run rather than being reset to 1234.

> +	dt_work *= 7;
> +	dt_work *= 7;
> +	dt_work *= 7;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 06/18] perf test cs-etm: Replace unroll loop thread with deterministic decode test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (4 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 05/18] perf test: Add deterministic workload James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 07/18] perf test cs-etm: Remove asm_pure_loop test James Clark
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Testing a long sequence without branches seems like it would be better
as a decoder unit test, and this test doesn't test decoding either, so
it's not clear what bugs this is trying to catch.

The new deterministic workload has somewhat long sequences when built
unoptimized, and we can always increase them later if we want to. But
now we test that decoding always gives the same result for the same
sequence of code which we've never had before.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/coresight/deterministic.sh  | 71 ++++++++++++++++++++++
 .../tests/shell/coresight/unroll_loop_thread_10.sh | 22 -------
 2 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/deterministic.sh b/tools/perf/tests/shell/coresight/deterministic.sh
new file mode 100755
index 000000000000..52e033fd6b82
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/deterministic.sh
@@ -0,0 +1,71 @@
+#!/bin/bash -e
+# Coresight deterministic workload decode (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_deterministic.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Start disabled and use the control FIFO to only record the workload and not
+# startup.
+perf record -o "$tmpdir/data" -e cs_etm//u -D -1 --control fifo:"$cf","$af" -- \
+	perf test --workload-ctl fifo:"$cf","$af" -w deterministic > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" --itrace=i1i -F ip,srcline | \
+	grep "deterministic.c" | uniq > "$tmpdir/script" 2>/dev/null
+
+
+# Remove open brace lines as they may not be hit depending on the compiler
+sed -i \
+  -e '/deterministic.c:8$/d' \
+  -e '/deterministic.c:15$/d' \
+  -e '/deterministic.c:23$/d' \
+  "$tmpdir/script"
+
+cat > "$tmpdir/expected" << EOF
+  deterministic.c:24
+  deterministic.c:25
+  deterministic.c:26
+  deterministic.c:28
+  deterministic.c:9
+  deterministic.c:10
+  deterministic.c:11
+  deterministic.c:12
+  deterministic.c:30
+  deterministic.c:31
+  deterministic.c:32
+  deterministic.c:34
+  deterministic.c:16
+  deterministic.c:17
+  deterministic.c:18
+  deterministic.c:19
+  deterministic.c:36
+  deterministic.c:37
+EOF
+
+if ! diff -q "$tmpdir/script" "$tmpdir/expected"; then
+	echo "FAIL: line numbers don't match expected: "
+	head -n 100 "$tmpdir/script"
+	cleanup
+	exit 1
+fi
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh b/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
deleted file mode 100755
index cb3e97a0a89f..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Unroll Loop Thread 10 (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="unroll_loop_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10"
-DATV="10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/18] perf test cs-etm: Remove asm_pure_loop test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (5 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 06/18] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

It's not obvious what this test is for so remove it. It's not a stress
test because it doesn't output lots of data and it's not a functional
test because it only looks for raw trace output. It seems to imply that
a program written in assembly influences whether trace would be
generated by the CPU or not, but the CPU doesn't know what language the
program is written in.

We already have lots of Coresight tests that test the full pipeline
including decoding, and in many more modes of operation than this one,
so if no trace was collected they will already fail leaving this one
redundant.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop.sh b/tools/perf/tests/shell/coresight/asm_pure_loop.sh
deleted file mode 100755
index 0301904b9637..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / ASM Pure Loop (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="asm_pure_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS=""
-DATV="out"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (6 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 07/18] perf test cs-etm: Remove asm_pure_loop test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 15:01   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 09/18] perf test: Add named_threads workload James Clark
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Like asm_pure_loop, this memcpy test only checks that 10 of each of a
few trace packet types occur after recording a lot of trace, which isn't
more specific than other existing Coresight tests.

Assume it was supposed to be a stress test for dumping and replace it
with one that doesn't require a custom binary and checks for a specific
amount of raw output. Don't bother checking for packets because the
other tests that test decoding will catch issues with malformed data.

This also adds coverage for exit snapshot mode which was missing.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/memcpy_thread_16k_10.sh  | 22 ---------
 .../perf/tests/shell/coresight/raw_dump_stress.sh  | 54 ++++++++++++++++++++++
 2 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh b/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
deleted file mode 100755
index 1f765d69acc3..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Memcpy 16k 10 Threads (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="memcpy_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="16 10 1"
-DATV="16k_10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/raw_dump_stress.sh b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
new file mode 100755
index 000000000000..eb5a5e5e1be6
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
@@ -0,0 +1,54 @@
+#!/bin/bash -e
+# Coresight raw dump stress (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for larger buffer size
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_raw_dump_stress.XXXXX)
+
+cleanup() {
+	rm -r "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+# Use exit snapshot to record 2M of trace to make about 80MB of raw dump data.
+echo "Recording..."
+perf record -e cs_etm/timestamp=0/u -m,2M -Se -o "$tmpdir/data" -- \
+	perf test -w brstack 20000 > /dev/null 2>&1
+
+# Test raw dump runs to completion but don't decode because that's too slow for
+# a test
+echo "Dumping raw trace..."
+perf report --dump-raw-trace -i "$tmpdir/data" 2>/dev/null > "$tmpdir/rawdump"
+err=$?
+
+if [ $err -ne 0 ]; then
+	echo "FAIL: perf report failed with error code $err"
+	cleanup
+	exit 1
+fi
+
+size=$(stat -c%s "$tmpdir/rawdump")
+if [ $size -gt $((50 * 1024 * 1024)) ]; then
+	echo "PASS: Raw dump file is larger than 50MB"
+	cleanup
+	exit 0
+fi
+
+echo "FAIL: Got less than 50MB (${size} bytes)"
+cleanup
+exit 1

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test
  2026-06-02 14:26 ` [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
@ 2026-06-02 15:01   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 15:01 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The explicit error handling block for `perf report` is fully unreachable due to `set -e`, making the intended diagnostic failure messages dead code.
--

commit 8eeed98667ec7c0548c0c55cf60670252a0dad1d
Author: James Clark <james.clark@linaro.org>

perf test cs-etm: Replace memcpy test with raw dump stress test

This replaces the coresight memcpy test with a raw dump stress test that
verifies the generation of a specific amount of raw output without requiring
a custom binary. It also introduces coverage for exit snapshot mode.

> diff --git a/tools/perf/tests/shell/coresight/raw_dump_stress.sh b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
> new file mode 100755
> index 0000000000000..eb5a5e5e1be62
> --- /dev/null
> +++ b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
> @@ -0,0 +1,54 @@
> +#!/bin/bash -e
> +# Coresight raw dump stress (exclusive)

[ ... ]

> +echo "Dumping raw trace..."
> +perf report --dump-raw-trace -i "$tmpdir/data" 2>/dev/null > "$tmpdir/rawdump"
> +err=$?
> +
> +if [ $err -ne 0 ]; then
> +	echo "FAIL: perf report failed with error code $err"
> +	cleanup
> +	exit 1
> +fi

[Severity: Medium]
Will this error handling block ever be reached if perf report fails?

Since the script starts with #!/bin/bash -e, a non-zero exit code from a
standalone command like perf report causes the shell to exit immediately and
trigger the trap_cleanup EXIT trap.

Could this prevent the err=$? assignment from occurring, leaving the specific
diagnostic message unprinted?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=8

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 09/18] perf test: Add named_threads workload
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (7 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 15:01   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 10/18] perf test cs-etm: Test decoding for concurrent threads test James Clark
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a workload that runs X threads that run a unique function named
"named_threads_thread[x]" which performs a multiplication in a loop for
Y loops. Each thread sets its name to "thread[x]".

This can be used to test that processor trace decoding handles
concurrent threads correctly and the correct symbols and thread names
are assigned to samples.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt     |   5 +-
 tools/perf/tests/builtin-test.c            |   1 +
 tools/perf/tests/tests.h                   |   1 +
 tools/perf/tests/workloads/Build           |   1 +
 tools/perf/tests/workloads/named_threads.c | 109 +++++++++++++++++++++++++++++
 5 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 7ec70c054cac..778c37f6efdb 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
 --workload=::
 	Run a built-in workload, to list them use '--list-workloads', current
 	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
-	context_switch_loop, deterministic and landlock.
+	context_switch_loop, deterministic, named_threads and landlock.
 
 	Used with the shell script regression tests.
 
@@ -66,6 +66,9 @@ OPTIONS
 		seconds: leafloop, noploop, sqrtloop, thloop
 		nrloops: brstack, context_switch_loop
 
+	'named_threads' accepts the number of threads and the number of loops to
+	do in each thread.
+
 	The datasym, landlock and deterministic workloads don't accept any.
 
 --list-workloads::
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 3c6d3de98be9..e5b6707bb119 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -149,6 +149,7 @@ static struct test_suite *generic_tests[] = {
 static struct test_workload *workloads[] = {
 	&workload__noploop,
 	&workload__thloop,
+	&workload__named_threads,
 	&workload__leafloop,
 	&workload__sqrtloop,
 	&workload__brstack,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index f8bba2d68769..ef3c3a269132 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -235,6 +235,7 @@ struct test_workload workload__##work = {	\
 /* The list of test workloads */
 DECLARE_WORKLOAD(noploop);
 DECLARE_WORKLOAD(thloop);
+DECLARE_WORKLOAD(named_threads);
 DECLARE_WORKLOAD(leafloop);
 DECLARE_WORKLOAD(sqrtloop);
 DECLARE_WORKLOAD(brstack);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 599a37c87878..81f5516f0767 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -2,6 +2,7 @@
 
 perf-test-y += noploop.o
 perf-test-y += thloop.o
+perf-test-y += named_threads.o
 perf-test-y += leafloop.o
 perf-test-y += sqrtloop.o
 perf-test-y += brstack.o
diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
new file mode 100644
index 000000000000..dc8070a98df4
--- /dev/null
+++ b/tools/perf/tests/workloads/named_threads.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <limits.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/compiler.h>
+#include "../tests.h"
+
+#define MAX_THREADS 25
+
+static int iterations = 500;
+int named_threads_work = 1234;
+
+typedef void *(*thread_fn_t)(void *);
+
+#define DEFINE_THREAD(n)						\
+noinline void *named_threads_thread##n(void *arg __maybe_unused)	\
+{									\
+	pthread_setname_np(pthread_self(), "thread" #n);		\
+	for (int i = 0; i < iterations; i++)				\
+		named_threads_work *= 3;				\
+									\
+	return NULL;							\
+}
+
+#define THREAD_LIST(macro)	\
+	macro(1)		\
+	macro(2)		\
+	macro(3)		\
+	macro(4)		\
+	macro(5)		\
+	macro(6)		\
+	macro(7)		\
+	macro(8)		\
+	macro(9)		\
+	macro(10)		\
+	macro(11)		\
+	macro(12)		\
+	macro(13)		\
+	macro(14)		\
+	macro(15)		\
+	macro(16)		\
+	macro(17)		\
+	macro(18)		\
+	macro(19)		\
+	macro(20)		\
+	macro(21)		\
+	macro(22)		\
+	macro(23)		\
+	macro(24)		\
+	macro(25)
+
+#define DECLARE_THREAD(n) void *named_threads_thread##n(void *arg);
+
+THREAD_LIST(DECLARE_THREAD)
+THREAD_LIST(DEFINE_THREAD)
+
+#define THREAD_ENTRY(n) named_threads_thread##n,
+
+static thread_fn_t thread_fns[MAX_THREADS] = {
+	THREAD_LIST(THREAD_ENTRY)
+};
+
+/*
+ * Creates argv[0] threads that run a unique function named "thread[x]" which performs
+ * a multiplication in a loop for argv[1] loops.
+ */
+static int named_threads(int argc, const char **argv)
+{
+	pthread_t threads[MAX_THREADS];
+	int nr_threads = 1;
+	int err = 0;
+
+	if (argc > 0)
+		nr_threads = atoi(argv[0]);
+
+	if (nr_threads <= 0 || nr_threads > MAX_THREADS) {
+		fprintf(stderr, "Error: num threads must be 1 - %d\n", MAX_THREADS);
+		return 1;
+	}
+
+	if (argc > 1)
+		iterations = atoi(argv[1]);
+
+	if (iterations < 0) {
+		fprintf(stderr, "Error: iterations must be non-negative\n");
+		return 1;
+	}
+
+	for (int i = 0; i < nr_threads; i++) {
+		int ret;
+
+		ret = pthread_create(&threads[i], NULL, thread_fns[i], NULL);
+		if (ret) {
+			fprintf(stderr, "Error: failed to create thread%d: %s\n",
+				i + 1, strerror(ret));
+			return 1;
+		}
+	}
+
+	for (int i = 0; i < nr_threads; i++)
+		pthread_join(threads[i], NULL);
+
+	return err;
+}
+
+DEFINE_WORKLOAD(named_threads);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 09/18] perf test: Add named_threads workload
  2026-06-02 14:26 ` [PATCH v2 09/18] perf test: Add named_threads workload James Clark
@ 2026-06-02 15:01   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 15:01 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Signed integer overflow and data races in the test workload loop cause Undefined Behavior (UB), which triggers TSAN/UBSAN failures and allows the compiler to optimize the loop away.
--

commit 2c5615e29c7533d5597f41700af3d8e8f97a8e97
Author: James Clark <james.clark@linaro.org>

perf test: Add named_threads workload

This commit adds a workload that spawns multiple threads which run a unique
function performing a multiplication loop. Each thread sets its own name,
which is used to test that processor trace decoding handles concurrent
threads correctly.

> diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
> new file mode 100644
> index 0000000000000..dc8070a98df4c
> --- /dev/null
> +++ b/tools/perf/tests/workloads/named_threads.c

[ ... ]

> +static int iterations = 500;
> +int named_threads_work = 1234;
> +
> +typedef void *(*thread_fn_t)(void *);
> +
> +#define DEFINE_THREAD(n)						\
> +noinline void *named_threads_thread##n(void *arg __maybe_unused)	\
> +{									\
> +	pthread_setname_np(pthread_self(), "thread" #n);		\
> +	for (int i = 0; i < iterations; i++)				\
> +		named_threads_work *= 3;				\

[Severity: Medium]
Will this loop trigger undefined behavior due to signed integer overflow and
data races?

Because named_threads_work is a shared global signed integer modified
concurrently without synchronization, this creates a data race.

Furthermore, multiplying 1234 by 3 for 500 iterations will cause a signed
integer overflow after just 14 iterations.

Since both data races and signed integer overflow are undefined behavior,
could the compiler aggressively optimize this loop away entirely, defeating
the purpose of generating trace data for the processor trace test?

Would it be safer to use a thread-local variable or declare it as a
volatile unsigned int, similar to the leafloop workload, to prevent
sanitizer failures and ensure the work is actually simulated?

> +									\
> +	return NULL;							\
> +}

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=9

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 10/18] perf test cs-etm: Test decoding for concurrent threads test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (8 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 09/18] perf test: Add named_threads workload James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests James Clark
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

The thread_loop test only looks for context IDs in the raw trace.
There's a lot more that can go wrong when decoding these, so replace it
with a test that looks at the final output for matching thread names and
symbols.

In the future we might use timestamps and context switch events to track
threads, so looking at context IDs in the raw trace wouldn't always
work.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/concurrent_threads.sh    | 45 ++++++++++++++++++++++
 .../shell/coresight/thread_loop_check_tid_10.sh    | 23 -----------
 .../shell/coresight/thread_loop_check_tid_2.sh     | 23 -----------
 3 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/concurrent_threads.sh b/tools/perf/tests/shell/coresight/concurrent_threads.sh
new file mode 100755
index 000000000000..bf34d4ee77a6
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/concurrent_threads.sh
@@ -0,0 +1,45 @@
+#!/bin/bash -e
+# Coresight concurrent threads (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_concurrent_threads.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+nthreads=10
+
+# Timestamps off to reduce trace size, start disabled and use the control FIFO
+# to only record the workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -D -1 --control fifo:"$cf","$af" \
+	-- perf test --workload-ctl fifo:"$cf","$af" -w named_threads $nthreads 1 > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" > "$tmpdir/script" 2>/dev/null
+
+# Check all threads were traced and they have the correct thread name and symbol
+for i in $(seq 1 $nthreads); do
+	if ! grep -q "thread${i} .* named_threads_thread${i}" "$tmpdir/script"; then
+		echo "Error: thread${i} missing" >&2
+		cleanup
+		exit 1
+	fi
+done
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
deleted file mode 100755
index 7f43a93a2ac2..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 10 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10 1"
-DATV="check-tid-10th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
deleted file mode 100755
index a94d2079ed06..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 2 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="2 20"
-DATV="check-tid-2th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (9 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 10/18] perf test cs-etm: Test decoding for concurrent threads test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 15:05   ` sashiko-bot
  2026-06-02 14:26 ` [PATCH v2 12/18] perf test cs-etm: Reduce snapshot size James Clark
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

We already test branch output in perf script mode, but then retest it in
Perf report mode. This is more of a test of Perf itself than Coresight
because Perf uses the same samples to generate both outputs. Also we're
already testing instruction output in Perf report mode.

Remove this test for a speedup. On the systemwide test also remove the
Perf report test because systemwide mode records a lot more data so
running multiple tests on it has a big runtime impact.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index bbf89e944e7b..8ed2c934c87d 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
 
 skip_if_no_cs_etm_event || exit 2
 
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for -C and system wide tests
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
 perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
 file=$(mktemp /tmp/temporary_file.XXXXX)
 
@@ -52,17 +58,6 @@ perf_script_branch_samples() {
 		grep -E " +$1 +[0-9]+ .* +branches:(.*:)? +" > /dev/null 2>&1
 }
 
-perf_report_branch_samples() {
-	echo "Looking at perf.data file for reporting branch samples:"
-
-	# Below is an example of the branch samples reporting:
-	#   73.04%    73.04%  touch    libc-2.27.so      [.] _dl_addr
-	#    7.71%     7.71%  touch    libc-2.27.so      [.] getenv
-	#    2.59%     2.59%  touch    ld-2.27.so        [.] strcmp
-	perf report --stdio -i ${perfdata} 2>&1 | \
-		grep -E " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
-}
-
 perf_report_instruction_samples() {
 	echo "Looking at perf.data file for instruction samples:"
 
@@ -123,7 +118,6 @@ arm_cs_iterate_devices() {
 
 			record_touch_file $device_name $2 &&
 			perf_script_branch_samples touch &&
-			perf_report_branch_samples touch &&
 			perf_report_instruction_samples touch
 
 			err=$?
@@ -154,9 +148,7 @@ arm_cs_etm_system_wide_test() {
 
 	# System-wide mode should include perf samples so test for that
 	# instead of ls
-	perf_script_branch_samples perf &&
-	perf_report_branch_samples perf &&
-	perf_report_instruction_samples perf
+	perf_script_branch_samples perf
 
 	err=$?
 	arm_cs_report "CoreSight system wide testing" $err
@@ -179,7 +171,6 @@ arm_cs_etm_snapshot_test() {
 	wait $PERFPID
 
 	perf_script_branch_samples dd &&
-	perf_report_branch_samples dd &&
 	perf_report_instruction_samples dd
 
 	err=$?
@@ -191,7 +182,6 @@ arm_cs_etm_basic_test() {
 	perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
 
 	perf_script_branch_samples ls &&
-	perf_report_branch_samples ls &&
 	perf_report_instruction_samples ls
 
 	err=$?

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests
  2026-06-02 14:26 ` [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests James Clark
@ 2026-06-02 15:05   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 15:05 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Undocumented global root check inappropriately skips the entire test suite for non-root users.
--

commit e286213d8a9568ef4b5ffeadc05f64626a70b603
Author: James Clark <james.clark@linaro.org>

perf test cs-etm: Remove duplicate branch tests

This commit removes duplicate branch tests in perf script and perf report
modes for the Coresight ETM tests to improve test execution speed. It also
removes the perf report test from system-wide mode due to its large runtime
impact.

> diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
> index bbf89e944e7b1..8ed2c934c87d0 100755
> --- a/tools/perf/tests/shell/test_arm_coresight.sh
> +++ b/tools/perf/tests/shell/test_arm_coresight.sh
> @@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
>  
>  skip_if_no_cs_etm_event || exit 2
>  
> +if [ "$(id -u)" != 0 ]; then
> +	# Requires root for -C and system wide tests
> +	echo "[Skip] No root permission"
> +	exit 2
> +fi
> +

[Severity: Medium]
The commit message states the patch's purpose is to remove duplicate branch
tests for a speedup, but this adds a new top-level check that skips the
entire test suite for non-root users. Is this intentional?

Relying strictly on UID=0 ignores modern capability-based access such as
CAP_PERFMON or CAP_SYS_ADMIN, as well as permissive sysctls like
kernel.perf_event_paranoid=-1.

Could this also unnecessarily bypass per-thread trace tests later in the
script that do not require system-wide privileges?

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=11

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 12/18] perf test cs-etm: Reduce snapshot size
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (10 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 13/18] perf test cs-etm: Speed up basic test James Clark
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

The default buffer size for root is 4MB which is very slow to decode. We
only need a few KB to verify that the dd process is hit so reduce the
size to 128KB.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 8ed2c934c87d..da2f599393e2 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -156,7 +156,7 @@ arm_cs_etm_system_wide_test() {
 
 arm_cs_etm_snapshot_test() {
 	echo "Recording trace with snapshot mode"
-	perf record -o ${perfdata} -e cs_etm// -S \
+	perf record -o ${perfdata} -e cs_etm// -S -m,128K \
 		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
 	PERFPID=$!
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 13/18] perf test cs-etm: Speed up basic test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (11 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 12/18] perf test cs-etm: Reduce snapshot size James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 14/18] perf test cs-etm: Remove unused Coresight workloads James Clark
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Like the name says, this should be the most basic test possible. Kernel
recording is slow and already has coverage on the systemwide test. Perf
report output also has coverage elsewhere. 'ls' also produces more trace
than 'true'.

We only want to test if the combination of recording options works at
all, so fix all of these things to make it as fast as possible.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index da2f599393e2..83295a8fe179 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -179,10 +179,9 @@ arm_cs_etm_snapshot_test() {
 
 arm_cs_etm_basic_test() {
 	echo "Recording trace with '$*'"
-	perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
+	perf record -o ${perfdata} "$@" -- true > /dev/null 2>&1
 
-	perf_script_branch_samples ls &&
-	perf_report_instruction_samples ls
+	perf_script_branch_samples true
 
 	err=$?
 	arm_cs_report "CoreSight basic testing with '$*'" $err
@@ -246,12 +245,12 @@ arm_cs_etm_snapshot_test
 
 # Test all combinations of per-thread, system-wide and normal mode with
 # and without timestamps
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u
 
 arm_cs_etm_sparse_cpus_test
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 14/18] perf test cs-etm: Remove unused Coresight workloads
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (12 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 13/18] perf test cs-etm: Speed up basic test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 15/18] perf test cs-etm: Make disassembly test use kcore James Clark
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

These are now unused and had various issues like not working with out of
source builds and being slow to compile. Delete them.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 Documentation/trace/coresight/coresight-perf.rst   |  78 +-----------
 MAINTAINERS                                        |   1 -
 tools/perf/Makefile.perf                           |  14 +--
 tools/perf/tests/shell/coresight/Makefile          |  29 -----
 .../perf/tests/shell/coresight/Makefile.miniconfig |  14 ---
 .../tests/shell/coresight/asm_pure_loop/.gitignore |   1 -
 .../tests/shell/coresight/asm_pure_loop/Makefile   |  34 ------
 .../shell/coresight/asm_pure_loop/asm_pure_loop.S  |  30 -----
 .../tests/shell/coresight/memcpy_thread/.gitignore |   1 -
 .../tests/shell/coresight/memcpy_thread/Makefile   |  33 -----
 .../shell/coresight/memcpy_thread/memcpy_thread.c  |  80 ------------
 .../tests/shell/coresight/thread_loop/.gitignore   |   1 -
 .../tests/shell/coresight/thread_loop/Makefile     |  33 -----
 .../shell/coresight/thread_loop/thread_loop.c      |  85 -------------
 .../shell/coresight/unroll_loop_thread/.gitignore  |   1 -
 .../shell/coresight/unroll_loop_thread/Makefile    |  33 -----
 .../unroll_loop_thread/unroll_loop_thread.c        |  75 ------------
 tools/perf/tests/shell/lib/coresight.sh            | 134 ---------------------
 18 files changed, 5 insertions(+), 672 deletions(-)

diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst
index 30be89320621..0a77741a431e 100644
--- a/Documentation/trace/coresight/coresight-perf.rst
+++ b/Documentation/trace/coresight/coresight-perf.rst
@@ -112,78 +112,6 @@ Example for triggering AUX pause and resume with PMU event::
 Perf test - Verify kernel and userspace perf CoreSight work
 -----------------------------------------------------------
 
-When you run perf test, it will do a lot of self tests. Some of those
-tests will cover CoreSight (only if enabled and on ARM64). You
-generally would run perf test from the tools/perf directory in the
-kernel tree. Some tests will check some internal perf support like:
-
-   Check Arm CoreSight trace data recording and synthesized samples
-   Check Arm SPE trace data recording and synthesized samples
-
-Some others will actually use perf record and some test binaries that
-are in tests/shell/coresight and will collect traces to ensure a
-minimum level of functionality is met. The scripts that launch these
-tests are in the same directory. These will all look like:
-
-   CoreSight / ASM Pure Loop
-   CoreSight / Memcpy 16k 10 Threads
-   CoreSight / Thread Loop 10 Threads - Check TID
-   etc.
-
-These perf record tests will not run if the tool binaries do not exist
-in tests/shell/coresight/\*/ and will be skipped. If you do not have
-CoreSight support in hardware then either do not build perf with
-CoreSight support or remove these binaries in order to not have these
-tests fail and have them skip instead.
-
-These tests will log historical results in the current working
-directory (e.g. tools/perf) and will be named stats-\*.csv like:
-
-   stats-asm_pure_loop-out.csv
-   stats-memcpy_thread-16k_10.csv
-   ...
-
-These statistic files log some aspects of the AUX data sections in
-the perf data output counting some numbers of certain encodings (a
-good way to know that it's working in a very simple way). One problem
-with CoreSight is that given a large enough amount of data needing to
-be logged, some of it can be lost due to the processor not waking up
-in time to read out all the data from buffers etc.. You will notice
-that the amount of data collected can vary a lot per run of perf test.
-If you wish to see how this changes over time, simply run perf test
-multiple times and all these csv files will have more and more data
-appended to it that you can later examine, graph and otherwise use to
-figure out if things have become worse or better.
-
-This means sometimes these tests fail as they don't capture all the
-data needed. This is about tracking quality and amount of data
-produced over time and to see when changes to the Linux kernel improve
-quality of traces.
-
-Be aware that some of these tests take quite a while to run, specifically
-in processing the perf data file and dumping contents to then examine what
-is inside.
-
-You can change where these csv logs are stored by setting the
-PERF_TEST_CORESIGHT_STATDIR environment variable before running perf
-test like::
-
-   export PERF_TEST_CORESIGHT_STATDIR=/var/tmp
-   perf test
-
-They will also store resulting perf output data in the current
-directory for later inspection like::
-
-   perf-asm_pure_loop-out.data
-   perf-memcpy_thread-16k_10.data
-   ...
-
-You can alter where the perf data files are stored by setting the
-PERF_TEST_CORESIGHT_DATADIR environment variable such as::
-
-   PERF_TEST_CORESIGHT_DATADIR=/var/tmp
-   perf test
-
-You may wish to set these above environment variables if you wish to
-keep the output of tests outside of the current working directory for
-longer term storage and examination.
+There are a set of Perf tests for CoreSight which can be run with::
+
+  sudo perf test coresight
diff --git a/MAINTAINERS b/MAINTAINERS
index b539be153f6a..7efb893edcbb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2751,7 +2751,6 @@ F:	tools/perf/arch/arm/util/cs-etm.h
 F:	tools/perf/arch/arm/util/pmu.c
 F:	tools/perf/tests/shell/*coresight*
 F:	tools/perf/tests/shell/coresight/*
-F:	tools/perf/tests/shell/lib/*coresight*
 F:	tools/perf/util/cs-etm-decoder/*
 F:	tools/perf/util/cs-etm.*
 
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 4ac2a0cec9ee..e4f8c979f47b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -508,16 +508,7 @@ arm64-sysreg-defs-clean:
 	$(Q)$(MAKE) -C $(arm64_gen_sysreg_dir) O=$(arm64_gen_sysreg_outdir) \
 		prefix= subdir= clean > /dev/null
 
-TESTS_CORESIGHT_DIR := $(srctree)/tools/perf/tests/shell/coresight
-
-tests-coresight-targets: FORCE
-	$(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR)
-
-tests-coresight-targets-clean:
-	$(call QUIET_CLEAN, coresight)
-	$(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR) O=$(OUTPUT) clean >/dev/null
-
-all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) tests-coresight-targets
+all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS)
 
 # Create python binding output directory if not already present
 $(shell [ -d '$(OUTPUT)python' ] || mkdir -p '$(OUTPUT)python')
@@ -896,7 +887,6 @@ install-tests: all install-gtk
 		$(INSTALL) tests/shell/base_report/*.txt '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/base_report'; \
 		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight' ; \
 		$(INSTALL) tests/shell/coresight/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight'
-	$(Q)$(MAKE) -C tests/shell/coresight install-tests
 
 install-bin: install-tools install-tests
 
@@ -939,7 +929,7 @@ endif
 
 clean:: $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBSYMBOL)-clean $(LIBPERF)-clean \
 		arm64-sysreg-defs-clean fixdep-clean python-clean bpf-skel-clean \
-		tests-coresight-targets-clean pmu-events-clean
+		pmu-events-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive \
 		$(OUTPUT)perf-iostat $(LANG_BINDINGS)
 	$(Q)find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '*.a' -delete -o \
diff --git a/tools/perf/tests/shell/coresight/Makefile b/tools/perf/tests/shell/coresight/Makefile
deleted file mode 100644
index fa08fd9a5991..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile
+++ /dev/null
@@ -1,29 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../../../../../tools/scripts/Makefile.include
-include ../../../../../tools/scripts/Makefile.arch
-include ../../../../../tools/scripts/utilities.mak
-
-SUBDIRS = \
-	asm_pure_loop \
-	memcpy_thread \
-	thread_loop \
-	unroll_loop_thread
-
-all: $(SUBDIRS)
-$(SUBDIRS):
-	@$(MAKE) -C $@ >/dev/null
-
-INSTALLDIRS = $(SUBDIRS:%=install-%)
-
-install-tests: $(INSTALLDIRS)
-$(INSTALLDIRS):
-	@$(MAKE) -C $(@:install-%=%) install-tests >/dev/null
-
-CLEANDIRS = $(SUBDIRS:%=clean-%)
-
-clean: $(CLEANDIRS)
-$(CLEANDIRS):
-	$(call QUIET_CLEAN, test-$(@:clean-%=%)) $(MAKE) -C $(@:clean-%=%) clean >/dev/null
-
-.PHONY: all clean $(SUBDIRS) $(CLEANDIRS) $(INSTALLDIRS)
diff --git a/tools/perf/tests/shell/coresight/Makefile.miniconfig b/tools/perf/tests/shell/coresight/Makefile.miniconfig
deleted file mode 100644
index 5f72a9cb43f3..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile.miniconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-ifndef DESTDIR
-prefix ?= $(HOME)
-endif
-
-DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
-INSTALL = install
-INSTDIR_SUB = tests/shell/coresight
-
-include ../../../../../scripts/Makefile.include
-include ../../../../../scripts/Makefile.arch
-include ../../../../../scripts/utilities.mak
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore b/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
deleted file mode 100644
index 468673ac32e8..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-asm_pure_loop
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile b/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
deleted file mode 100644
index 206849e92bc9..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
+++ /dev/null
@@ -1,34 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=asm_pure_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=
-
-all: $(BIN)
-
-$(BIN): $(BIN).S
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line - this is raw asm with no libc to have an always exact binary
-	$(Q)$(CC) $(BIN).S -nostdlib -static -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S b/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
deleted file mode 100644
index 577760046772..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Tamas Zsoldos <tamas.zsoldos@arm.com>, 2021 */
-
-.globl _start
-_start:
-	mov	x0, 0x0000ffff
-	mov	x1, xzr
-loop:
-	nop
-	nop
-	cbnz	x1, noskip
-	nop
-	nop
-	adrp	x2, skip
-	add 	x2, x2, :lo12:skip
-	br	x2
-	nop
-	nop
-noskip:
-	nop
-	nop
-skip:
-	sub	x0, x0, 1
-	cbnz	x0, loop
-
-	mov	x0, #0
-	mov	x8, #93 // __NR_exit syscall
-	svc	#0
-
-.section .note.GNU-stack, "", @progbits
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore b/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
deleted file mode 100644
index f8217e56091e..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-memcpy_thread
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile b/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
deleted file mode 100644
index 2db637eb2c26..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=memcpy_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c b/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
deleted file mode 100644
index 7e879217be30..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
+++ /dev/null
@@ -1,80 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
-	unsigned long loops;
-	unsigned long size;
-	pthread_t th;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	unsigned long i, len = a->loops;
-	unsigned char *src, *dst;
-
-	src = malloc(a->size * 1024);
-	dst = malloc(a->size * 1024);
-	if ((!src) || (!dst)) {
-		printf("ERR: Can't allocate memory\n");
-		exit(1);
-	}
-	for (i = 0; i < len; i++)
-		memcpy(dst, src, a->size * 1024);
-
-	return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned long i, len, size, thr;
-	struct args args[256];
-	long long v;
-
-	if (argc < 4) {
-		printf("ERR: %s [copysize Kb] [numthreads] [numloops (hundreds)]\n", argv[0]);
-		exit(1);
-	}
-
-	v = atoll(argv[1]);
-	if ((v < 1) || (v > (1024 * 1024))) {
-		printf("ERR: max memory 1GB (1048576 KB)\n");
-		exit(1);
-	}
-	size = v;
-	thr = atol(argv[2]);
-	if ((thr < 1) || (thr > 256)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	v = atoll(argv[3]);
-	if ((v < 1) || (v > 40000000000ll)) {
-		printf("ERR: loops 1-40000000000 (hundreds)\n");
-		exit(1);
-	}
-	len = v * 100;
-	for (i = 0; i < thr; i++) {
-		args[i].loops = len;
-		args[i].size = size;
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/thread_loop/.gitignore b/tools/perf/tests/shell/coresight/thread_loop/.gitignore
deleted file mode 100644
index 6d4c33eaa9e8..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-thread_loop
diff --git a/tools/perf/tests/shell/coresight/thread_loop/Makefile b/tools/perf/tests/shell/coresight/thread_loop/Makefile
deleted file mode 100644
index ea846c038e7a..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=thread_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c b/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
deleted file mode 100644
index 86f3f548b006..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
+++ /dev/null
@@ -1,85 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-// define this for gettid()
-#define _GNU_SOURCE
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-#include <sys/syscall.h>
-#ifndef SYS_gettid
-// gettid is 178 on arm64
-# define SYS_gettid 178
-#endif
-#define gettid() syscall(SYS_gettid)
-
-struct args {
-	unsigned int loops;
-	pthread_t th;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	int i = 0, len = a->loops;
-
-	if (getenv("SHOW_TID")) {
-		unsigned long long tid = gettid();
-
-		printf("%llu\n", tid);
-	}
-	asm volatile(
-		"loop:\n"
-		"add %w[i], %w[i], #1\n"
-		"cmp %w[i], %w[len]\n"
-		"blt loop\n"
-		: /* out */
-		: /* in */ [i] "r" (i), [len] "r" (len)
-		: /* clobber */
-	);
-	return (void *)(long)i;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned int i, len, thr;
-	struct args args[256];
-
-	if (argc < 3) {
-		printf("ERR: %s [numthreads] [numloops (millions)]\n", argv[0]);
-		exit(1);
-	}
-
-	thr = atoi(argv[1]);
-	if ((thr < 1) || (thr > 256)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	len = atoi(argv[2]);
-	if ((len < 1) || (len > 4000)) {
-		printf("ERR: max loops 4000 (millions)\n");
-		exit(1);
-	}
-	len *= 1000000;
-	for (i = 0; i < thr; i++) {
-		args[i].loops = len;
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore b/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
deleted file mode 100644
index 2cb4e996dbf3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-unroll_loop_thread
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile b/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
deleted file mode 100644
index 6264c4e3abd1..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=unroll_loop_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c b/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
deleted file mode 100644
index 8f4e1c985ca3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
+++ /dev/null
@@ -1,75 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
-	pthread_t th;
-	unsigned int in;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	unsigned int i, in = a->in;
-
-	for (i = 0; i < 10000; i++) {
-		asm volatile (
-// force an unroll of thia add instruction so we can test long runs of code
-#define SNIP1 "add %w[in], %w[in], #1\n"
-// 10
-#define SNIP2 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1
-// 100
-#define SNIP3 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2
-// 1000
-#define SNIP4 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3
-// 10000
-#define SNIP5 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4
-// 100000
-			SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5
-			: /* out */
-			: /* in */ [in] "r" (in)
-			: /* clobber */
-		);
-	}
-
-	return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned int i, thr;
-	struct args args[256];
-
-	if (argc < 2) {
-		printf("ERR: %s [numthreads]\n", argv[0]);
-		exit(1);
-	}
-
-	thr = atoi(argv[1]);
-	if ((thr > 256) || (thr < 1)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	for (i = 0; i < thr; i++) {
-		args[i].in = rand();
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/lib/coresight.sh b/tools/perf/tests/shell/lib/coresight.sh
deleted file mode 100644
index 184d62e7e5bd..000000000000
--- a/tools/perf/tests/shell/lib/coresight.sh
+++ /dev/null
@@ -1,134 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-# This is sourced from a driver script so no need for #!/bin... etc. at the
-# top - the assumption below is that it runs as part of sourcing after the
-# test sets up some basic env vars to say what it is.
-
-# This currently works with ETMv4 / ETF not any other packet types at thi
-# point. This will need changes if that changes.
-
-# perf record options for the perf tests to use
-PERFRECMEM="-m ,16M"
-PERFRECOPT="$PERFRECMEM -e cs_etm//u"
-
-TOOLS=$(dirname $0)
-DIR="$TOOLS/$TEST"
-BIN="$DIR/$TEST"
-# If the test tool/binary does not exist and is executable then skip the test
-if ! test -x "$BIN"; then exit 2; fi
-# If CoreSight is not available, skip the test
-perf list pmu | grep -q cs_etm || exit 2
-DATD="."
-# If the data dir env is set then make the data dir use that instead of ./
-if test -n "$PERF_TEST_CORESIGHT_DATADIR"; then
-	DATD="$PERF_TEST_CORESIGHT_DATADIR";
-fi
-# If the stat dir env is set then make the data dir use that instead of ./
-STATD="."
-if test -n "$PERF_TEST_CORESIGHT_STATDIR"; then
-	STATD="$PERF_TEST_CORESIGHT_STATDIR";
-fi
-
-# Called if the test fails - error code 1
-err() {
-	echo "$1"
-	exit 1
-}
-
-# Check that some statistics from our perf
-check_val_min() {
-	STATF="$4"
-	if test "$2" -lt "$3"; then
-		echo ", FAILED" >> "$STATF"
-		err "Sanity check number of $1 is too low ($2 < $3)"
-	fi
-}
-
-perf_dump_aux_verify() {
-	# Some basic checking that the AUX chunk contains some sensible data
-	# to see that we are recording something and at least a minimum
-	# amount of it. We should almost always see Fn packets in just about
-	# anything but certainly we will see some trace info and async
-	# packets
-	DUMP="$DATD/perf-tmp-aux-dump.txt"
-	perf report --stdio --dump -i "$1" | \
-		grep -o -e I_ATOM_F -e I_ASYNC -e I_TRACE_INFO > "$DUMP"
-	# Simply count how many of these packets we find to see that we are
-	# producing a reasonable amount of data - exact checks are not sane
-	# as this is a lossy process where we may lose some blocks and the
-	# compiler may produce different code depending on the compiler and
-	# optimization options, so this is rough just to see if we're
-	# either missing almost all the data or all of it
-	ATOM_FX_NUM=$(grep -c I_ATOM_F "$DUMP")
-	ASYNC_NUM=$(grep -c I_ASYNC "$DUMP")
-	TRACE_INFO_NUM=$(grep -c I_TRACE_INFO "$DUMP")
-	rm -f "$DUMP"
-
-	# Arguments provide minimums for a pass
-	CHECK_FX_MIN="$2"
-	CHECK_ASYNC_MIN="$3"
-	CHECK_TRACE_INFO_MIN="$4"
-
-	# Write out statistics, so over time you can track results to see if
-	# there is a pattern - for example we have less "noisy" results that
-	# produce more consistent amounts of data each run, to see if over
-	# time any techinques to  minimize data loss are having an effect or
-	# not
-	STATF="$STATD/stats-$TEST-$DATV.csv"
-	if ! test -f "$STATF"; then
-		echo "ATOM Fx Count, Minimum, ASYNC Count, Minimum, TRACE INFO Count, Minimum" > "$STATF"
-	fi
-	echo -n "$ATOM_FX_NUM, $CHECK_FX_MIN, $ASYNC_NUM, $CHECK_ASYNC_MIN, $TRACE_INFO_NUM, $CHECK_TRACE_INFO_MIN" >> "$STATF"
-
-	# Actually check to see if we passed or failed.
-	check_val_min "ATOM_FX" "$ATOM_FX_NUM" "$CHECK_FX_MIN" "$STATF"
-	check_val_min "ASYNC" "$ASYNC_NUM" "$CHECK_ASYNC_MIN" "$STATF"
-	check_val_min "TRACE_INFO" "$TRACE_INFO_NUM" "$CHECK_TRACE_INFO_MIN" "$STATF"
-	echo ", Ok" >> "$STATF"
-}
-
-perf_dump_aux_tid_verify() {
-	# Specifically crafted test will produce a list of Tread ID's to
-	# stdout that need to be checked to  see that they have had trace
-	# info collected in AUX blocks in the perf data. This will go
-	# through all the TID's that are listed as CID=0xabcdef and see
-	# that all the Thread IDs the test tool reports are  in the perf
-	# data AUX chunks
-
-	# The TID test tools will print a TID per stdout line that are being
-	# tested
-	TIDS=$(cat "$2")
-	# Scan the perf report to find the TIDs that are actually CID in hex
-	# and build a list of the ones found
-	FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
-			grep -o "CID=0x[0-9a-z]\+" | sed 's/CID=//g' | \
-			uniq | sort | uniq)
-	# No CID=xxx found - maybe your kernel is reporting these as
-	# VMID=xxx so look there
-	if test -z "$FOUND_TIDS"; then
-		FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
-				grep -o "VMID=0x[0-9a-z]\+" | sed 's/VMID=//g' | \
-				uniq | sort | uniq)
-	fi
-
-	# Iterate over the list of TIDs that the test says it has and find
-	# them in the TIDs found in the perf report
-	MISSING=""
-	for TID2 in $TIDS; do
-		FOUND=""
-		for TIDHEX in $FOUND_TIDS; do
-			TID=$(printf "%i" $TIDHEX)
-			if test "$TID" -eq "$TID2"; then
-				FOUND="y"
-				break
-			fi
-		done
-		if test -z "$FOUND"; then
-			MISSING="$MISSING $TID"
-		fi
-	done
-	if test -n "$MISSING"; then
-		err "Thread IDs $MISSING not found in perf AUX data"
-	fi
-}

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 15/18] perf test cs-etm: Make disassembly test use kcore
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (13 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 14/18] perf test cs-etm: Remove unused Coresight workloads James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 16/18] perf test cs-etm: Add all branch instructions to test James Clark
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Hits in modules return empty disassembly with vmlinux as an input to
objdump. Make the disassembly test more reliable by always using kcore.
And update the comments to say that this is supported by the script.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/scripts/python/arm-cs-trace-disasm.py    | 20 ++++++++++----------
 tools/perf/tests/shell/test_arm_coresight_disasm.sh |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index ba208c90d631..8f6fa4a007b4 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -18,29 +18,29 @@ from perf_trace_context import perf_sample_srccode, perf_config_get
 
 # Below are some example commands for using this script.
 # Note a --kcore recording is required for accurate decode
-# due to the alternatives patching mechanism. However this
-# script only supports reading vmlinux for disassembly dump,
-# meaning that any patched instructions will appear
-# as unpatched, but the instruction ranges themselves will
-# be correct. In addition to this, source line info comes
-# from Perf, and when using kcore there is no debug info. The
-# following lists the supported features in each mode:
+# due to the alternatives patching mechanism. In addition to this,
+# source line info comes from Perf, and when using kcore there is
+# no debug info. The following lists the supported features in each mode:
 #
 # +-----------+-----------------+------------------+------------------+
 # | Recording | Accurate decode | Source line dump | Disassembly dump |
 # +-----------+-----------------+------------------+------------------+
 # | --kcore   | yes             | no               | yes              |
-# | normal    | no              | yes              | yes              |
+# | normal    | no              | yes (inaccurate) | yes (inaccurate) |
 # +-----------+-----------------+------------------+------------------+
 #
 # Output disassembly with objdump and auto detect vmlinux
-# (when running on same machine.)
+# (when running on same machine.):
 #  perf script -s scripts/python/arm-cs-trace-disasm.py -d
 #
 # Output disassembly with llvm-objdump:
 #  perf script -s scripts/python/arm-cs-trace-disasm.py \
 #		-- -d llvm-objdump-11 -k path/to/vmlinux
 #
+# Output accurate disassembly by passing kcore to script:
+#  perf script -s scripts/python/arm-cs-trace-disasm.py \
+#		-- -d -k perf.data/kcore_dir/kcore
+#
 # Output only source line and symbols:
 #  perf script -s scripts/python/arm-cs-trace-disasm.py
 
@@ -57,7 +57,7 @@ def int_arg(v):
 
 args = argparse.ArgumentParser()
 args.add_argument("-k", "--vmlinux",
-		  help="Set path to vmlinux file. Omit to autodetect if running on same machine")
+		  help="Set path to vmlinux or kcore file. Omit to autodetect if running on same machine")
 args.add_argument("-d", "--objdump", nargs="?", const=default_objdump(),
 		  help="Show disassembly. Can also be used to change the objdump path"),
 args.add_argument("-v", "--verbose", action="store_true", help="Enable debugging log")
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 0dfb4fadf531..8b5c60a09012 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -46,7 +46,7 @@ if [ -e /proc/kcore ]; then
 	echo "Testing kernel disassembly"
 	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
-		-d --stop-sample=30 2> /dev/null > ${file}
+		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
 	grep -q -e ${branch_search} ${file}
 	echo "Found kernel branches"
 else

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 16/18] perf test cs-etm: Add all branch instructions to test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (14 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 15/18] perf test cs-etm: Make disassembly test use kcore James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 14:26 ` [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test James Clark
  2026-06-02 14:27 ` [PATCH v2 18/18] perf test cs-etm: Move existing tests to coresight folder James Clark
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

If we reduce the number of samples searched to speed up the test, then
there will be less chance of hitting one of these branches. Extend the
regex to cover all branches so the test will always pass.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight_disasm.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 8b5c60a09012..5ee87eb7973e 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -38,8 +38,7 @@ cleanup_files()
 trap cleanup_files EXIT TERM INT
 
 # Ranges start and end on branches, so check for some likely branch instructions
-sep="\s\|\s"
-branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s"
+branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al))?|br|blr|ret|cbz|cbnz|tbz|tbnz|svc|eret)([[:space:]]|$)'
 
 ## Test kernel ##
 if [ -e /proc/kcore ]; then
@@ -47,7 +46,7 @@ if [ -e /proc/kcore ]; then
 	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
 		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
-	grep -q -e ${branch_search} ${file}
+	grep -q -E ${branch_search} ${file}
 	echo "Found kernel branches"
 else
 	# kcore is required for correct kernel decode due to runtime code patching
@@ -59,7 +58,7 @@ echo "Testing userspace disassembly"
 perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
 perf script -i ${perfdata} -s python:${script_path} -- \
 	-d --stop-sample=30 2> /dev/null > ${file}
-grep -q -e ${branch_search} ${file}
+grep -q -E ${branch_search} ${file}
 echo "Found userspace branches"
 
 glb_err=0

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (15 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 16/18] perf test cs-etm: Add all branch instructions to test James Clark
@ 2026-06-02 14:26 ` James Clark
  2026-06-02 15:26   ` sashiko-bot
  2026-06-02 14:27 ` [PATCH v2 18/18] perf test cs-etm: Move existing tests to coresight folder James Clark
  17 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-02 14:26 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

We can use exit snapshot to limit the amount of trace to decode here
too. Also each call to objdump is quite expensive on kcore so limit it
to 2 samples instead of 30. We only want to see if there is no data at
all.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight_disasm.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 5ee87eb7973e..2a400fd38a36 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al)
 ## Test kernel ##
 if [ -e /proc/kcore ]; then
 	echo "Testing kernel disassembly"
-	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
+	perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
-		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
+		-d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
 	grep -q -E ${branch_search} ${file}
 	echo "Found kernel branches"
 else
@@ -55,9 +55,9 @@ fi
 
 ## Test user ##
 echo "Testing userspace disassembly"
-perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
+perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1
 perf script -i ${perfdata} -s python:${script_path} -- \
-	-d --stop-sample=30 2> /dev/null > ${file}
+	-d --stop-sample=2 2> /dev/null > ${file}
 grep -q -E ${branch_search} ${file}
 echo "Found userspace branches"
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test
  2026-06-02 14:26 ` [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test James Clark
@ 2026-06-02 15:26   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-02 15:26 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Limiting the disassembly test to `--stop-sample=2` makes the test highly flaky, as early trace blocks split by exceptions will contain no branch instructions.
--

commit 6abc0c1405ac75056533c403bf19e73b43a0c26a
Author: James Clark <james.clark@linaro.org>

perf test cs-etm: Speed up disassembly test

This commit attempts to speed up the disassembly test by using exit snapshots
and limiting the perf script output to 2 samples instead of 30. This avoids
expensive objdump calls, particularly when using kcore.

> diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> @@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al)
>  ## Test kernel ##
>  if [ -e /proc/kcore ]; then
>  	echo "Testing kernel disassembly"
> -	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
> +	perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1
>  	perf script -i ${perfdata} -s python:${script_path} -- \
> -		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
> +		-d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}

[Severity: Medium]
Could this tight limit cause the test to become flaky?

By reducing --stop-sample to 2, the Python script processes exactly 3 events
before exiting. The first event is usually a Trace On discontinuity packet
which produces no disassembly.

If the following trace block is interrupted by a kernel interrupt before a
branch instruction occurs, the decoded block won't contain any branches. The
third event would just be another Trace On packet, and the script would exit
having produced an output file with zero branch instructions.

Since the test script runs with set -e, wouldn't the subsequent grep fail
and trigger a false positive test failure?

>  	grep -q -E ${branch_search} ${file}
>  	echo "Found kernel branches"

[ ... ]

> @@ -55,9 +55,9 @@ fi
>  
>  ## Test user ##
>  echo "Testing userspace disassembly"
> -perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
> +perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1
>  perf script -i ${perfdata} -s python:${script_path} -- \
> -	-d --stop-sample=30 2> /dev/null > ${file}
> +	-d --stop-sample=2 2> /dev/null > ${file}

[Severity: Medium]
Similarly, does this risk introducing flakiness in the userspace testing due
to early page faults?

As with the kernel trace, if early userspace execution hits a page fault in
ld.so or another exception before taking a branch, the limited events
processed won't include any branch disassembly.

Would slightly increasing the stop-sample limit avoid these intermittent
early execution trace interruptions while still remaining faster than 30?

>  grep -q -E ${branch_search} ${file}
>  echo "Found userspace branches"

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org?part=17

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 18/18] perf test cs-etm: Move existing tests to coresight folder
  2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
                   ` (16 preceding siblings ...)
  2026-06-02 14:26 ` [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test James Clark
@ 2026-06-02 14:27 ` James Clark
  17 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-02 14:27 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

There is a subfolder for Coresight tests so might as well keep them all
in here.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 MAINTAINERS                                                         | 1 -
 tools/perf/tests/shell/{ => coresight}/test_arm_coresight.sh        | 0
 tools/perf/tests/shell/{ => coresight}/test_arm_coresight_disasm.sh | 2 +-
 3 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7efb893edcbb..ff8935b459ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2749,7 +2749,6 @@ F:	tools/perf/arch/arm/util/auxtrace.c
 F:	tools/perf/arch/arm/util/cs-etm.c
 F:	tools/perf/arch/arm/util/cs-etm.h
 F:	tools/perf/arch/arm/util/pmu.c
-F:	tools/perf/tests/shell/*coresight*
 F:	tools/perf/tests/shell/coresight/*
 F:	tools/perf/util/cs-etm-decoder/*
 F:	tools/perf/util/cs-etm.*
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/coresight/test_arm_coresight.sh
similarity index 100%
rename from tools/perf/tests/shell/test_arm_coresight.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight.sh
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
similarity index 96%
rename from tools/perf/tests/shell/test_arm_coresight_disasm.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
index 2a400fd38a36..b196aab709f8 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
@@ -24,7 +24,7 @@ perfdata_dir=$(mktemp -d /tmp/__perf_test.perf.data.XXXXX)
 perfdata=${perfdata_dir}/perf.data
 file=$(mktemp /tmp/temporary_file.XXXXX)
 # Relative path works whether it's installed or running from repo
-script_path=$(dirname "$0")/../../scripts/python/arm-cs-trace-disasm.py
+script_path=$(dirname "$0")/../../../scripts/python/arm-cs-trace-disasm.py
 
 cleanup_files()
 {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-06-02 15:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 14:26 [PATCH v2 00/18] perf cs-etm: Queue context packets for frontend James Clark
2026-06-02 14:26 ` [PATCH v2 01/18] " James Clark
2026-06-02 14:43   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 02/18] perf test: Add workload-ctl option James Clark
2026-06-02 14:40   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 03/18] perf test: Add a workload that forces context switches James Clark
2026-06-02 14:38   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 04/18] perf test cs-etm: Test process attribution James Clark
2026-06-02 14:26 ` [PATCH v2 05/18] perf test: Add deterministic workload James Clark
2026-06-02 14:49   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 06/18] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
2026-06-02 14:26 ` [PATCH v2 07/18] perf test cs-etm: Remove asm_pure_loop test James Clark
2026-06-02 14:26 ` [PATCH v2 08/18] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
2026-06-02 15:01   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 09/18] perf test: Add named_threads workload James Clark
2026-06-02 15:01   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 10/18] perf test cs-etm: Test decoding for concurrent threads test James Clark
2026-06-02 14:26 ` [PATCH v2 11/18] perf test cs-etm: Remove duplicate branch tests James Clark
2026-06-02 15:05   ` sashiko-bot
2026-06-02 14:26 ` [PATCH v2 12/18] perf test cs-etm: Reduce snapshot size James Clark
2026-06-02 14:26 ` [PATCH v2 13/18] perf test cs-etm: Speed up basic test James Clark
2026-06-02 14:26 ` [PATCH v2 14/18] perf test cs-etm: Remove unused Coresight workloads James Clark
2026-06-02 14:26 ` [PATCH v2 15/18] perf test cs-etm: Make disassembly test use kcore James Clark
2026-06-02 14:26 ` [PATCH v2 16/18] perf test cs-etm: Add all branch instructions to test James Clark
2026-06-02 14:26 ` [PATCH v2 17/18] perf test cs-etm: Speed up disassembly test James Clark
2026-06-02 15:26   ` sashiko-bot
2026-06-02 14:27 ` [PATCH v2 18/18] perf test cs-etm: Move existing tests to coresight folder James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox