[PATCH v5 00/19] perf cs-etm: Queue context packets for frontend

Linux Perf Users
 help / color / mirror / Atom feed

* [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend
@ 2026-06-09 14:40 James Clark
  2026-06-09 14:40 ` [PATCH v5 01/19] " James Clark
                   ` (18 more replies)
  0 siblings, 19 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Fix thread tracking when decoding Coresight trace and add a new test for
it.

The new test is added as a Perf test workload instead of a custom binary
with its own build system, but this requires a new feature in Perf test
to pass in control pipes which can enable and disable events. This
scopes the recording to just the workload and helps to reduce the amount
of data recorded in tracing tests.

With this new feature we can re-write all of the Coresight tests to make
use of it and remove the remaining binaries which fixes the following
issues:

 * They didn't work in out of source builds
 * A lot of the tests unnecessarily required root and didn't skip
   without it
 * They were mainly qualitative tests which didn't look for specific
   behavior

Most importantly, the long build and runtime has been reduced. On a
Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
than the entire Perf build. Now the build time is negligible and the
before and after test runtimes for all the Coresight tests are:

          |   N1SDP   |   Orion O6
  -----------------------------------
  Before  |   4m  0s  |    14m 49s
  After   |      26s  |        56s
  -----------------------------------

Signed-off-by: James Clark <james.clark@linaro.org>
---
Changes in v5:
- Forgot to include this change:
  - Test for actual length of expected raw dump (Leo)
- Link to v4: https://lore.kernel.org/r/20260609-james-cs-context-tracking-fix-v4-0-44f9fb9e5c42@linaro.org

Changes in v4:
- Rename workload-ctl to record-ctl and improve docs (Leo)
- Use new packet argument everywhere in
  cs_etm__synth_instruction_sample() (Sashiko)
- Test for actual length of expected raw dump (Leo)
- Use -fno-inline instead of keyword (Leo)
- Don't test any brace or call lines in deterministic test
- Make sure context switch loop test does cleanup on failure (Sashiko)
- Remove undef int overflows in workloads (Sashiko)
- Link to v3: https://lore.kernel.org/r/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org

Changes in v3:
- Minor sashiko comments
  - Close some more pipes
  - Fix warning messages
  - Error handling improvements
- Pass packet into cs_etm__synth_instruction_sample()
- Fixup stale comment (Leo)
- Link to v2: https://lore.kernel.org/r/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org

Changes in v2:
- Add --workload-ctl option to Perf test
- Re-write all the Coresight tests and speed them up
- Pass packet to memory access function so frontend can use either the
  previous or current packet's EL
- Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd602e18287@linaro.org

---
James Clark (19):
      perf cs-etm: Queue context packets for frontend
      perf test: Add workload-ctl option
      perf test: Add a workload that forces context switches
      perf test cs-etm: Test process attribution
      perf test: Add deterministic workload
      perf test cs-etm: Replace unroll loop thread with deterministic decode test
      perf test cs-etm: Remove asm_pure_loop test
      perf test cs-etm: Replace memcpy test with raw dump stress test
      perf test: Add named_threads workload
      perf test cs-etm: Test decoding for concurrent threads test
      perf test cs-etm: Remove duplicate branch tests
      perf test cs-etm: Skip if not root
      perf test cs-etm: Reduce snapshot size
      perf test cs-etm: Speed up basic test
      perf test cs-etm: Remove unused Coresight workloads
      perf test cs-etm: Make disassembly test use kcore
      perf test cs-etm: Add all branch instructions to test
      perf test cs-etm: Speed up disassembly test
      perf test cs-etm: Move existing tests to coresight folder

 Documentation/trace/coresight/coresight-perf.rst   |  78 +------
 MAINTAINERS                                        |   2 -
 tools/perf/Documentation/perf-test.txt             |  24 ++-
 tools/perf/Makefile.perf                           |  14 +-
 tools/perf/scripts/python/arm-cs-trace-disasm.py   |  20 +-
 tools/perf/tests/builtin-test.c                    | 187 +++++++++++++++-
 tools/perf/tests/shell/coresight/Makefile          |  29 ---
 .../perf/tests/shell/coresight/Makefile.miniconfig |  14 --
 tools/perf/tests/shell/coresight/asm_pure_loop.sh  |  22 --
 .../tests/shell/coresight/asm_pure_loop/.gitignore |   1 -
 .../tests/shell/coresight/asm_pure_loop/Makefile   |  34 ---
 .../shell/coresight/asm_pure_loop/asm_pure_loop.S  |  30 ---
 .../tests/shell/coresight/concurrent_threads.sh    |  45 ++++
 .../tests/shell/coresight/context_switch_thread.sh |  69 ++++++
 tools/perf/tests/shell/coresight/deterministic.sh  |  72 +++++++
 .../tests/shell/coresight/memcpy_thread/.gitignore |   1 -
 .../tests/shell/coresight/memcpy_thread/Makefile   |  33 ---
 .../shell/coresight/memcpy_thread/memcpy_thread.c  |  80 -------
 .../tests/shell/coresight/memcpy_thread_16k_10.sh  |  22 --
 .../perf/tests/shell/coresight/raw_dump_stress.sh  |  65 ++++++
 .../shell/{ => coresight}/test_arm_coresight.sh    |  43 ++--
 .../{ => coresight}/test_arm_coresight_disasm.sh   |  23 +-
 .../tests/shell/coresight/thread_loop/.gitignore   |   1 -
 .../tests/shell/coresight/thread_loop/Makefile     |  33 ---
 .../shell/coresight/thread_loop/thread_loop.c      |  85 --------
 .../shell/coresight/thread_loop_check_tid_10.sh    |  23 --
 .../shell/coresight/thread_loop_check_tid_2.sh     |  23 --
 .../shell/coresight/unroll_loop_thread/.gitignore  |   1 -
 .../shell/coresight/unroll_loop_thread/Makefile    |  33 ---
 .../unroll_loop_thread/unroll_loop_thread.c        |  75 -------
 .../tests/shell/coresight/unroll_loop_thread_10.sh |  22 --
 tools/perf/tests/shell/lib/coresight.sh            | 134 ------------
 tools/perf/tests/tests.h                           |   3 +
 tools/perf/tests/workloads/Build                   |   4 +
 tools/perf/tests/workloads/context_switch_loop.c   | 110 ++++++++++
 tools/perf/tests/workloads/deterministic.c         |  39 ++++
 tools/perf/tests/workloads/named_threads.c         | 109 ++++++++++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  21 +-
 tools/perf/util/cs-etm.c                           | 236 ++++++++++++---------
 tools/perf/util/cs-etm.h                           |   8 +-
 40 files changed, 926 insertions(+), 942 deletions(-)
---
base-commit: 351a37f2fda4db668cff8ba12f2992d73dccdaea
change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed

Best regards,
-- 
James Clark <james.clark@linaro.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 01/19] perf cs-etm: Queue context packets for frontend
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 02/19] perf test: Add workload-ctl option James Clark
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

PE_CONTEXT elements update the context ID and exception level, but the
decoder may still have prior packets cached for frontend processing.
Updating the context immediately in the decoder backend can make those
cached packets get consumed with the wrong thread or EL state.

Add a CS_ETM_CONTEXT packet carrying the TID and EL to the frontend,
this keeps context changes ordered with the rest of the packet stream
and avoids mismatches when synthesizing samples from cached packets.

Separate the memory access function into one for the frontend and one
for decoding. The frontend also needs memory access to attach the
instruction to samples. Because the frontend does memory access for
both previous and current packets, change all the frontend memory access
function signatures to take both a tidq and packet. But backend always
uses the current backend EL and thread from the tidq.

Treat context packets as a boundary for branch sample generation and
remove tidq->prev_packet_thread because it's not possible to branch to a
different thread, so only tracking the current thread is required for
sample generation.

Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Reported-by: Amir Ayupov <aaupov@meta.com>
Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/
Co-authored-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  21 ++-
 tools/perf/util/cs-etm.c                        | 236 ++++++++++++++----------
 tools/perf/util/cs-etm.h                        |   8 +-
 3 files changed, 163 insertions(+), 102 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index dee3020ceaa9..26940f1f1b0b 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -402,6 +402,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
 	packet_queue->packet_buffer[et].flags = 0;
 	packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
 	packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
+	packet_queue->packet_buffer[et].el = ocsd_EL_unknown;
+	packet_queue->packet_buffer[et].tid = -1;
 
 	if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -449,6 +451,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
 	packet->last_instr_type = elem->last_i_type;
 	packet->last_instr_subtype = elem->last_i_subtype;
 	packet->last_instr_cond = elem->last_instr_cond;
+	packet->el = elem->context.exception_level;
 
 	if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT)
 		packet->last_instr_taken_branch = elem->last_instr_exec;
@@ -525,7 +528,9 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 			const ocsd_generic_trace_elem *elem,
 			const uint8_t trace_chan_id)
 {
+	struct cs_etm_packet *packet;
 	pid_t tid = -1;
+	int ret;
 
 	/*
 	 * Process the PE_CONTEXT packets if we have a valid contextID or VMID.
@@ -546,12 +551,18 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 		break;
 	}
 
-	if (cs_etm__etmq_set_tid_el(etmq, tid, trace_chan_id,
-				    elem->context.exception_level))
+	if (cs_etm__etmq_update_decode_context(etmq, trace_chan_id,
+				elem->context.exception_level, tid))
 		return OCSD_RESP_FATAL_SYS_ERR;
 
-	if (tid == -1)
-		return OCSD_RESP_CONT;
+	ret = cs_etm_decoder__buffer_packet(etmq, packet_queue, trace_chan_id,
+					    CS_ETM_CONTEXT);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &packet_queue->packet_buffer[packet_queue->tail];
+	packet->tid = tid;
+	packet->el = elem->context.exception_level;
 
 	/*
 	 * A timestamp is generated after a PE_CONTEXT element so make sure
@@ -559,7 +570,7 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
 	 */
 	cs_etm_decoder__reset_timestamp(packet_queue);
 
-	return OCSD_RESP_CONT;
+	return ret;
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 40c6ddfa8c8d..5e92359f51a7 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -85,15 +85,22 @@ struct cs_etm_traceid_queue {
 	u64 period_instructions;
 	size_t last_branch_pos;
 	union perf_event *event_buf;
-	struct thread *thread;
-	struct thread *prev_packet_thread;
-	ocsd_ex_level prev_packet_el;
-	ocsd_ex_level el;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
 	struct cs_etm_packet *prev_packet;
 	struct cs_etm_packet *packet;
 	struct cs_etm_packet_queue packet_queue;
+
+	struct thread *decode_thread;
+	ocsd_ex_level decode_el;
+
+	/*
+	 * The frontend accesses the EL from '[prev_]packet' because it needs
+	 * previous EL for branch and current EL for instruction samples. It's
+	 * not possible to change thread in a single branch sample so no need to
+	 * store or access the thread through the packet.
+	 */
+	struct thread *frontend_thread;
 };
 
 enum cs_etm_format {
@@ -614,10 +621,11 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 
 	queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
 	tidq->trace_chan_id = trace_chan_id;
-	tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
-	tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
+	tidq->decode_el = ocsd_EL_unknown;
+	tidq->frontend_thread = machine__findnew_thread(&etm->session->machines.host, -1,
+					       queue->tid);
+	tidq->decode_thread = machine__findnew_thread(&etm->session->machines.host, -1,
 					       queue->tid);
-	tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
 
 	tidq->packet = zalloc(sizeof(struct cs_etm_packet));
 	if (!tidq->packet)
@@ -750,21 +758,10 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
 		/*
 		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 		 * the next incoming packet.
-		 *
-		 * Threads and exception levels are also tracked for both the
-		 * previous and current packets. This is because the previous
-		 * packet is used for the 'from' IP for branch samples, so the
-		 * thread at that time must also be assigned to that sample.
-		 * Across discontinuity packets the thread can change, so by
-		 * tracking the thread for the previous packet the branch sample
-		 * will have the correct info.
 		 */
 		tmp = tidq->packet;
 		tidq->packet = tidq->prev_packet;
 		tidq->prev_packet = tmp;
-		tidq->prev_packet_el = tidq->el;
-		thread__put(tidq->prev_packet_thread);
-		tidq->prev_packet_thread = thread__get(tidq->thread);
 	}
 }
 
@@ -937,8 +934,8 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 
 		/* Free this traceid_queue from the array */
 		tidq = etmq->traceid_queues[idx];
-		thread__zput(tidq->thread);
-		thread__zput(tidq->prev_packet_thread);
+		thread__zput(tidq->frontend_thread);
+		thread__zput(tidq->decode_thread);
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->last_branch_rb);
@@ -1083,47 +1080,43 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address,
 	}
 }
 
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
-			      u64 address, size_t size, u8 *buffer,
-			      const ocsd_mem_space_acc_t mem_space)
+static u32 __cs_etm__mem_access(struct cs_etm_queue *etmq,
+				u64 address, size_t size, u8 *buffer,
+				const ocsd_mem_space_acc_t mem_space,
+				ocsd_ex_level el, struct thread *thread)
 {
 	u8  cpumode;
 	u64 offset;
 	int len;
 	struct addr_location al;
 	struct dso *dso;
-	struct cs_etm_traceid_queue *tidq;
 	int ret = 0;
 
 	if (!etmq)
 		return 0;
 
 	addr_location__init(&al);
-	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
-	if (!tidq)
-		goto out;
 
 	/*
-	 * We've already tracked EL along side the PID in cs_etm__set_thread()
-	 * so double check that it matches what OpenCSD thinks as well. It
-	 * doesn't distinguish between EL0 and EL1 for this mem access callback
-	 * so we had to do the extra tracking. Skip validation if it's any of
-	 * the 'any' values.
+	 * We track EL for the frontend and the backend when receiving context
+	 * and range packets. OpenCSD doesn't distinguish between EL0 and EL1
+	 * for this mem access callback so we had to do the extra tracking. Skip
+	 * validation if it's any of the 'any' values.
 	 */
 	if (!(mem_space == OCSD_MEM_SPACE_ANY ||
 	      mem_space == OCSD_MEM_SPACE_N || mem_space == OCSD_MEM_SPACE_S)) {
 		if (mem_space & OCSD_MEM_SPACE_EL1N) {
 			/* Includes both non secure EL1 and EL0 */
-			assert(tidq->el == ocsd_EL1 || tidq->el == ocsd_EL0);
+			assert(el == ocsd_EL1 || el == ocsd_EL0);
 		} else if (mem_space & OCSD_MEM_SPACE_EL2)
-			assert(tidq->el == ocsd_EL2);
+			assert(el == ocsd_EL2);
 		else if (mem_space & OCSD_MEM_SPACE_EL3)
-			assert(tidq->el == ocsd_EL3);
+			assert(el == ocsd_EL3);
 	}
 
-	cpumode = cs_etm__cpu_mode(etmq, address, tidq->el);
+	cpumode = cs_etm__cpu_mode(etmq, address, el);
 
-	if (!thread__find_map(tidq->thread, cpumode, address, &al))
+	if (!thread__find_map(thread, cpumode, address, &al))
 		goto out;
 
 	dso = map__dso(al.map);
@@ -1138,7 +1131,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
 
 	map__load(al.map);
 
-	len = dso__data_read_offset(dso, maps__machine(thread__maps(tidq->thread)),
+	len = dso__data_read_offset(dso, maps__machine(thread__maps(thread)),
 				    offset, buffer, size);
 
 	if (len <= 0) {
@@ -1158,6 +1151,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
 	return ret;
 }
 
+static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq,
+				       struct cs_etm_traceid_queue *tidq,
+				       struct cs_etm_packet *packet,
+				       u64 address, size_t size, u8 *buffer)
+{
+	return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el,
+				    tidq->frontend_thread);
+}
+
+static u32 cs_etm__decoder_mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
+				      u64 address, size_t size, u8 *buffer,
+				      const ocsd_mem_space_acc_t mem_space)
+{
+	struct cs_etm_traceid_queue *tidq;
+
+	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
+	if (!tidq)
+		return 0;
+
+	return __cs_etm__mem_access(etmq, address, size, buffer,
+				    mem_space, tidq->decode_el,
+				    tidq->decode_thread);
+}
+
 static struct cs_etm_queue *cs_etm__alloc_queue(void)
 {
 	struct cs_etm_queue *etmq = zalloc(sizeof(*etmq));
@@ -1333,12 +1350,13 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq)
 }
 
 static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
-					 u8 trace_chan_id, u64 addr)
+					 struct cs_etm_traceid_queue *tidq,
+					 struct cs_etm_packet *packet, u64 addr)
 {
 	u8 instrBytes[2];
 
-	cs_etm__mem_access(etmq, trace_chan_id, addr, ARRAY_SIZE(instrBytes),
-			   instrBytes, 0);
+	cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+				    ARRAY_SIZE(instrBytes), instrBytes);
 	/*
 	 * T32 instruction size is indicated by bits[15:11] of the first
 	 * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
@@ -1371,16 +1389,16 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
 }
 
 static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
-				     u64 trace_chan_id,
-				     const struct cs_etm_packet *packet,
+				     struct cs_etm_traceid_queue *tidq,
+				     struct cs_etm_packet *packet,
 				     u64 offset)
 {
 	if (packet->isa == CS_ETM_ISA_T32) {
 		u64 addr = packet->start_addr;
 
 		while (offset) {
-			addr += cs_etm__t32_instr_size(etmq,
-						       trace_chan_id, addr);
+			addr += cs_etm__t32_instr_size(etmq, tidq, packet,
+						       addr);
 			offset--;
 		}
 		return addr;
@@ -1490,34 +1508,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
 	return etmq->buf_len;
 }
 
-static void cs_etm__set_thread(struct cs_etm_queue *etmq,
-			       struct cs_etm_traceid_queue *tidq, pid_t tid,
-			       ocsd_ex_level el)
+/*
+ * Convert a raw thread number to a thread struct and assign it to **thread.
+ */
+static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq,
+				      ocsd_ex_level el, pid_t tid,
+				      struct thread **thread)
 {
 	struct machine *machine = cs_etm__get_machine(etmq, el);
 
+	if (!machine || !*thread)
+		return -EINVAL;
+
 	if (tid != -1) {
-		thread__zput(tidq->thread);
-		tidq->thread = machine__find_thread(machine, -1, tid);
+		thread__zput(*thread);
+		*thread = machine__find_thread(machine, -1, tid);
 	}
 
 	/* Couldn't find a known thread */
-	if (!tidq->thread)
-		tidq->thread = machine__idle_thread(machine);
+	if (!*thread)
+		*thread = machine__idle_thread(machine);
 
-	tidq->el = el;
+	return 0;
 }
 
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
-			    u8 trace_chan_id, ocsd_ex_level el)
+/*
+ * Set the thread and EL of the decode context which is ahead in time of the
+ * frontend context.
+ */
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+				       u8 trace_chan_id,
+				       ocsd_ex_level el, pid_t tid)
 {
 	struct cs_etm_traceid_queue *tidq;
+	int ret;
 
 	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
 	if (!tidq)
 		return -EINVAL;
 
-	cs_etm__set_thread(etmq, tidq, tid, el);
+	ret = cs_etm__etmq_update_thread(etmq, el, tid,
+					 &tidq->decode_thread);
+	if (ret)
+		return ret;
+
+	tidq->decode_el = el;
 	return 0;
 }
 
@@ -1527,8 +1562,8 @@ bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq)
 }
 
 static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
-			      u64 trace_chan_id,
-			      const struct cs_etm_packet *packet,
+			      struct cs_etm_traceid_queue *tidq,
+			      struct cs_etm_packet *packet,
 			      struct perf_sample *sample)
 {
 	/*
@@ -1545,14 +1580,14 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
 	 * cs_etm__t32_instr_size().
 	 */
 	if (packet->isa == CS_ETM_ISA_T32)
-		sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
+		sample->insn_len = cs_etm__t32_instr_size(etmq, tidq, packet,
 							  sample->ip);
 	/* Otherwise, A64 and A32 instruction size are always 32-bit. */
 	else
 		sample->insn_len = 4;
 
-	cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len,
-			   (void *)sample->insn, 0);
+	cs_etm__frontend_mem_access(etmq, tidq, packet, sample->ip,
+				    sample->insn_len, (void *)sample->insn);
 }
 
 u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp)
@@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
 
 static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 					    struct cs_etm_traceid_queue *tidq,
+					    struct cs_etm_packet *packet,
 					    u64 addr, u64 period)
 {
 	int ret = 0;
@@ -1588,23 +1624,23 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	perf_sample__init(&sample, /*all=*/true);
 	event->sample.header.type = PERF_RECORD_SAMPLE;
-	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
+	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = addr;
-	sample.pid = thread__pid(tidq->thread);
-	sample.tid = thread__tid(tidq->thread);
+	sample.pid = thread__pid(tidq->frontend_thread);
+	sample.tid = thread__tid(tidq->frontend_thread);
 	sample.id = etmq->etm->instructions_id;
 	sample.stream_id = etmq->etm->instructions_id;
 	sample.period = period;
-	sample.cpu = tidq->packet->cpu;
+	sample.cpu = packet->cpu;
 	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+	cs_etm__copy_insn(etmq, tidq, packet, &sample);
 
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
@@ -1649,15 +1685,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
-						     tidq->prev_packet_el);
+						     tidq->prev_packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = ip;
-	sample.pid = thread__pid(tidq->prev_packet_thread);
-	sample.tid = thread__tid(tidq->prev_packet_thread);
+	sample.pid = thread__pid(tidq->frontend_thread);
+	sample.tid = thread__tid(tidq->frontend_thread);
 	sample.addr = cs_etm__first_executed_instr(tidq->packet);
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
@@ -1666,8 +1702,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet,
-			  &sample);
+	cs_etm__copy_insn(etmq, tidq, tidq->prev_packet, &sample);
 
 	/*
 	 * perf report cannot handle events without a branch stack
@@ -1788,7 +1823,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	int ret;
-	u8 trace_chan_id = tidq->trace_chan_id;
 	u64 instrs_prev;
 
 	/* Get instructions remainder from previous packet */
@@ -1874,10 +1908,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * been executed, but PC has not advanced to next
 			 * instruction)
 			 */
-			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+			addr = cs_etm__instr_addr(etmq, tidq, tidq->packet,
+						  offset - 1);
 			ret = cs_etm__synth_instruction_sample(
-				etmq, tidq, addr,
+				etmq, tidq, tidq->packet, addr,
 				etm->instructions_sample_period);
 			if (ret)
 				return ret;
@@ -1959,7 +1993,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 		addr = cs_etm__last_executed_instr(tidq->prev_packet);
 
 		err = cs_etm__synth_instruction_sample(
-			etmq, tidq, addr,
+			etmq, tidq, tidq->prev_packet, addr,
 			tidq->period_instructions);
 		if (err)
 			return err;
@@ -2014,7 +2048,7 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
 		addr = cs_etm__last_executed_instr(tidq->prev_packet);
 
 		err = cs_etm__synth_instruction_sample(
-			etmq, tidq, addr,
+			etmq, tidq, tidq->prev_packet, addr,
 			tidq->period_instructions);
 		if (err)
 			return err;
@@ -2051,9 +2085,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
 	return etmq->buf_len;
 }
 
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
-				 struct cs_etm_packet *packet,
-				 u64 end_addr)
+static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq,
+				 struct cs_etm_traceid_queue *tidq,
+				 struct cs_etm_packet *packet, u64 end_addr)
 {
 	/* Initialise to keep compiler happy */
 	u16 instr16 = 0;
@@ -2075,8 +2109,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * so below only read 2 bytes as instruction size for T32.
 		 */
 		addr = end_addr - 2;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr16),
-				   (u8 *)&instr16, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr16), (u8 *)&instr16);
 		if ((instr16 & 0xFF00) == 0xDF00)
 			return true;
 
@@ -2091,8 +2125,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * +---------+---------+-------------------------+
 		 */
 		addr = end_addr - 4;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
-				   (u8 *)&instr32, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr32), (u8 *)&instr32);
 		if ((instr32 & 0x0F000000) == 0x0F000000 &&
 		    (instr32 & 0xF0000000) != 0xF0000000)
 			return true;
@@ -2108,8 +2142,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 		 * +-----------------------+---------+-----------+
 		 */
 		addr = end_addr - 4;
-		cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
-				   (u8 *)&instr32, 0);
+		cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+					    sizeof(instr32), (u8 *)&instr32);
 		if ((instr32 & 0xFFE0001F) == 0xd4000001)
 			return true;
 
@@ -2125,7 +2159,6 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
 static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
 			       struct cs_etm_traceid_queue *tidq, u64 magic)
 {
-	u8 trace_chan_id = tidq->trace_chan_id;
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
 
@@ -2140,7 +2173,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
 	 */
 	if (magic == __perf_cs_etmv4_magic) {
 		if (packet->exception_number == CS_ETMV4_EXC_CALL &&
-		    cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+		    cs_etm__is_svc_instr(etmq, tidq, prev_packet,
 					 prev_packet->end_addr))
 			return true;
 	}
@@ -2178,7 +2211,6 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
 				      struct cs_etm_traceid_queue *tidq,
 				      u64 magic)
 {
-	u8 trace_chan_id = tidq->trace_chan_id;
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
 
@@ -2204,7 +2236,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
 		 * (SMC, HVC) are taken as sync exceptions.
 		 */
 		if (packet->exception_number == CS_ETMV4_EXC_CALL &&
-		    !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+		    !cs_etm__is_svc_instr(etmq, tidq, prev_packet,
 					  prev_packet->end_addr))
 			return true;
 
@@ -2228,7 +2260,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 {
 	struct cs_etm_packet *packet = tidq->packet;
 	struct cs_etm_packet *prev_packet = tidq->prev_packet;
-	u8 trace_chan_id = tidq->trace_chan_id;
 	u64 magic;
 	int ret;
 
@@ -2309,11 +2340,11 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 		if (prev_packet->flags == (PERF_IP_FLAG_BRANCH |
 					   PERF_IP_FLAG_RETURN |
 					   PERF_IP_FLAG_INTERRUPT) &&
-		    cs_etm__is_svc_instr(etmq, trace_chan_id,
-					 packet, packet->start_addr))
+		    cs_etm__is_svc_instr(etmq, tidq, packet, packet->start_addr)) {
 			prev_packet->flags = PERF_IP_FLAG_BRANCH |
 					     PERF_IP_FLAG_RETURN |
 					     PERF_IP_FLAG_SYSCALLRET;
+		}
 		break;
 	case CS_ETM_DISCONTINUITY:
 		/*
@@ -2394,6 +2425,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
 					     PERF_IP_FLAG_RETURN |
 					     PERF_IP_FLAG_INTERRUPT;
 		break;
+	case CS_ETM_CONTEXT:
 	case CS_ETM_EMPTY:
 	default:
 		break;
@@ -2469,6 +2501,19 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 			 */
 			cs_etm__sample(etmq, tidq);
 			break;
+		case CS_ETM_CONTEXT:
+			/*
+			 * Update context but don't swap packet. Keep the
+			 * previous one for branch source address info, if
+			 * tracing the kernel the context packet will be emitted
+			 * between two ranges.
+			 */
+			ret = cs_etm__etmq_update_thread(etmq, tidq->packet->el,
+							 tidq->packet->tid,
+							 &tidq->frontend_thread);
+			if (ret)
+				goto out;
+			break;
 		case CS_ETM_EXCEPTION:
 		case CS_ETM_EXCEPTION_RET:
 			/*
@@ -2497,6 +2542,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
 		}
 	}
 
+out:
 	return ret;
 }
 
@@ -2620,7 +2666,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
 			if (!tidq)
 				continue;
 
-			if (tid == -1 || thread__tid(tidq->thread) == tid)
+			if (tid == -1 || thread__tid(tidq->frontend_thread) == tid)
 				cs_etm__run_per_thread_timeless_decoder(etmq);
 		} else
 			cs_etm__run_per_cpu_timeless_decoder(etmq);
@@ -3328,7 +3374,7 @@ static int cs_etm__create_queue_decoders(struct cs_etm_queue *etmq)
 	 */
 	if (cs_etm_decoder__add_mem_access_cb(etmq->decoder,
 					      0x0L, ((u64) -1L),
-					      cs_etm__mem_access))
+					      cs_etm__decoder_mem_access))
 		goto out_free_decoder;
 
 	zfree(&t_params);
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index aa9bb4a32eca..b81099c2b301 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -158,6 +158,7 @@ enum cs_etm_sample_type {
 	CS_ETM_DISCONTINUITY,
 	CS_ETM_EXCEPTION,
 	CS_ETM_EXCEPTION_RET,
+	CS_ETM_CONTEXT,
 };
 
 enum cs_etm_isa {
@@ -184,6 +185,8 @@ struct cs_etm_packet {
 	u8 last_instr_size;
 	u8 trace_chan_id;
 	int cpu;
+	int el;
+	pid_t tid;
 };
 
 #define CS_ETM_PACKET_MAX_BUFFER 1024
@@ -259,8 +262,9 @@ enum cs_etm_pid_fmt {
 #include <opencsd/ocsd_if_types.h>
 int cs_etm__get_cpu(struct cs_etm_queue *etmq, u8 trace_chan_id, int *cpu);
 enum cs_etm_pid_fmt cs_etm__get_pid_fmt(struct cs_etm_queue *etmq);
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
-			    u8 trace_chan_id, ocsd_ex_level el);
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+				       u8 trace_chan_id, ocsd_ex_level el,
+				       pid_t tid);
 bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq);
 void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
 					      u8 trace_chan_id);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 02/19] perf test: Add workload-ctl option
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
  2026-06-09 14:40 ` [PATCH v5 01/19] " James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 03/19] perf test: Add a workload that forces context switches James Clark
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a --workload-ctl=fifo:ctl-fifo[,ack-fifo] option for 'perf test
-w'. When set, run_workload() opens the named FIFO, writes enable before
invoking the builtin workload, writes disable before returning, and
waits for ack responses when an ack FIFO is provided to ensure that the
workload doesn't run until the events are enabled.

This can be used to limit the scope of the recording to only the
workload execution and avoid recording Perf setup and teardown code if
Perf record is started with events disabled (-D 1).

Assisted-by: Codex:GPT-5.5
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt |  12 +++
 tools/perf/tests/builtin-test.c        | 184 ++++++++++++++++++++++++++++++++-
 2 files changed, 194 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 32da0d1fa86a..2f4a91f5b9dc 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -69,3 +69,15 @@ OPTIONS
 
 --list-workloads::
 	List the available workloads to use with -w/--workload.
+
+--record-ctl=fifo:ctl-fifo[,ack-fifo]::
+	This option is used to communicate with a perf record session in order
+	to control the recording scope to only the workload and avoid recording
+	setup and teardown code. When specifying this option, the same FIFO path
+	must be specified in the record session via:
+
+	  perf record -D -1 --control=fifo:ctl-fifo[,ack-fifo] ...
+
+	Perf test sends 'enable' and 'disable' commands through ctl-fifo to
+	control event recording. If 'ack-fifo' is provided, the workload runner
+	waits for an 'ack' response after each command.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b64fc2204f22..86ea427eb0aa 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -56,6 +56,7 @@ static unsigned int runs_per_test = 1;
 static unsigned int failure_snippet_lines = 10;
 const char *dso_to_test;
 const char *test_objdump_path = "objdump";
+static const char *workload_control;
 
 /*
  * List of architecture specific tests. Not a weak symbol as the array length is
@@ -168,6 +169,11 @@ static struct test_workload *workloads[] = {
 #endif
 };
 
+struct workload_control {
+	int ctl_fd;
+	int ack_fd;
+};
+
 #define workloads__for_each(workload) \
 	for (unsigned i = 0; i < ARRAY_SIZE(workloads) && ({ workload = workloads[i]; 1; }); i++)
 
@@ -1387,13 +1393,185 @@ static int workloads__fprintf_list(FILE *fp)
 	return printed;
 }
 
+static int perf_control_open_fifo(struct workload_control *ctl, const char *str)
+{
+	char *s, *p;
+	int ret;
+
+	if (strncmp(str, "fifo:", 5))
+		return -EINVAL;
+
+	str += 5;
+	if (!*str || *str == ',')
+		return -EINVAL;
+
+	s = strdup(str);
+	if (!s)
+		return -ENOMEM;
+
+	p = strchr(s, ',');
+	if (p)
+		*p = '\0';
+
+	ctl->ctl_fd = open(s, O_WRONLY | O_CLOEXEC);
+	if (ctl->ctl_fd < 0) {
+		ret = -errno;
+		pr_err("Failed to open workload control FIFO '%s': %m\n", s);
+		free(s);
+		return ret;
+	}
+
+	if (p && *++p) {
+		ctl->ack_fd = open(p, O_RDONLY | O_CLOEXEC);
+		if (ctl->ack_fd < 0) {
+			ret = -errno;
+			pr_err("Failed to open workload control ack FIFO '%s': %m\n", p);
+			close(ctl->ctl_fd);
+			ctl->ctl_fd = -1;
+			free(s);
+			return ret;
+		}
+	}
+
+	free(s);
+	return 0;
+}
+
+static int perf_control_open(struct workload_control *ctl)
+{
+	int ret;
+
+	if (!workload_control)
+		return 0;
+
+	ret = perf_control_open_fifo(ctl, workload_control);
+
+	if (ret == -EINVAL) {
+		pr_err("Unsupported workload control spec '%s', expected fifo:ctl-fifo[,ack-fifo]\n",
+			workload_control);
+	}
+
+	return ret;
+}
+
+static void perf_control_close(struct workload_control *ctl)
+{
+	if (ctl->ctl_fd >= 0) {
+		close(ctl->ctl_fd);
+		ctl->ctl_fd = -1;
+	}
+	if (ctl->ack_fd >= 0) {
+		close(ctl->ack_fd);
+		ctl->ack_fd = -1;
+	}
+}
+
+static int perf_control_write_cmd(int fd, const char *cmd)
+{
+	size_t len = strlen(cmd);
+	ssize_t ret;
+
+	while (len) {
+		ret = write(fd, cmd, len);
+		if (ret < 0) {
+			if (errno == EINTR)
+				continue;
+			pr_err("Failed to write perf control command: %m\n");
+			return -1;
+		}
+
+		if (!ret) {
+			pr_err("Failed to write perf control command: short write\n");
+			return -1;
+		}
+
+		cmd += ret;
+		len -= ret;
+	}
+
+	return 0;
+}
+
+static int perf_control_read_ack(int fd)
+{
+	char buf[16];
+	ssize_t ret;
+
+	do {
+		ret = read(fd, buf, sizeof(buf) - 1);
+	} while (ret < 0 && errno == EINTR);
+
+	if (ret < 0) {
+		pr_err("Failed to read perf control ack: %m\n");
+		return -1;
+	}
+
+	if (!ret) {
+		pr_err("Unexpected EOF while reading perf control ack\n");
+		return -1;
+	}
+
+	buf[ret] = '\0';
+	for (ssize_t i = 0; i < ret; i++) {
+		if (buf[i] == '\n' || buf[i] == '\0') {
+			buf[i] = '\0';
+			break;
+		}
+	}
+
+	if (strcmp(buf, "ack")) {
+		pr_err("Unexpected perf control ack: %s\n", buf);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int perf_control_send(struct workload_control *ctl, const char *cmd)
+{
+	if (ctl->ctl_fd < 0)
+		return 0;
+
+	if (perf_control_write_cmd(ctl->ctl_fd, cmd))
+		return -1;
+
+	if (ctl->ack_fd >= 0 && perf_control_read_ack(ctl->ack_fd))
+		return -1;
+
+	return 0;
+}
+
 static int run_workload(const char *work, int argc, const char **argv)
 {
 	struct test_workload *twl;
 
 	workloads__for_each(twl) {
-		if (!strcmp(twl->name, work))
-			return twl->func(argc, argv);
+		struct workload_control ctl = {
+			.ctl_fd = -1,
+			.ack_fd = -1,
+		};
+		int control_ret, ret;
+
+		if (strcmp(twl->name, work))
+			continue;
+
+		ret = perf_control_open(&ctl);
+		if (ret)
+			return ret;
+
+		if (perf_control_send(&ctl, "enable\n")) {
+			perf_control_close(&ctl);
+			return -1;
+		}
+
+		ret = twl->func(argc, argv);
+
+		control_ret = perf_control_send(&ctl, "disable\n");
+		perf_control_close(&ctl);
+		if (control_ret)
+			return -1;
+
+		return ret;
 	}
 
 	pr_info("No workload found: %s\n", work);
@@ -1486,6 +1664,8 @@ int cmd_test(int argc, const char **argv)
 	OPT_UINTEGER('r', "runs-per-test", &runs_per_test,
 		     "Run each test the given number of times, default 1"),
 	OPT_STRING('w', "workload", &workload, "work", "workload to run for testing, use '--list-workloads' to list the available ones."),
+	OPT_STRING(0, "record-ctl", &workload_control, "fifo:ctl-fifo[,ack-fifo]",
+		   "Write enable to the fifo just before running the workload and disable after, with optional ack from ack-fifo"),
 	OPT_BOOLEAN(0, "list-workloads", &list_workloads, "List the available builtin workloads to use with -w/--workload"),
 	OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
 	OPT_STRING(0, "objdump", &test_objdump_path, "path",

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 03/19] perf test: Add a workload that forces context switches
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
  2026-06-09 14:40 ` [PATCH v5 01/19] " James Clark
  2026-06-09 14:40 ` [PATCH v5 02/19] perf test: Add workload-ctl option James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 04/19] perf test cs-etm: Test process attribution James Clark
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

This workload launches two processes that block when reading and writing
to each other forcing the other process to be scheduled for each
read/write pair.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt           |   7 +-
 tools/perf/tests/builtin-test.c                  |   1 +
 tools/perf/tests/tests.h                         |   1 +
 tools/perf/tests/workloads/Build                 |   1 +
 tools/perf/tests/workloads/context_switch_loop.c | 110 +++++++++++++++++++++++
 5 files changed, 117 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 2f4a91f5b9dc..213eb62603eb 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -55,15 +55,16 @@ OPTIONS
 
 -w::
 --workload=::
-	Run a built-in workload, to list them use '--list-workloads', current ones include:
-	noploop, thloop, leafloop, sqrtloop, brstack, datasym and landlock.
+	Run a built-in workload, to list them use '--list-workloads', current
+	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
+	context_switch_loop and landlock.
 
 	Used with the shell script regression tests.
 
 	Some accept an extra parameter:
 
 		seconds: leafloop, noploop, sqrtloop, thloop
-		nrloops: brstack
+		nrloops: brstack, context_switch_loop
 
 	The datasym and landlock workloads don't accept any.
 
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 86ea427eb0aa..9284f897de3c 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -163,6 +163,7 @@ static struct test_workload *workloads[] = {
 	&workload__traploop,
 	&workload__inlineloop,
 	&workload__jitdump,
+	&workload__context_switch_loop,
 
 #ifdef HAVE_RUST_SUPPORT
 	&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf8ff7d54727..7cd4da4e96d3 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -245,6 +245,7 @@ DECLARE_WORKLOAD(landlock);
 DECLARE_WORKLOAD(traploop);
 DECLARE_WORKLOAD(inlineloop);
 DECLARE_WORKLOAD(jitdump);
+DECLARE_WORKLOAD(context_switch_loop);
 
 #ifdef HAVE_RUST_SUPPORT
 DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 0eb6d99528eb..7134a031cb7c 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -10,6 +10,7 @@ perf-test-y += landlock.o
 perf-test-y += traploop.o
 perf-test-y += inlineloop.o
 perf-test-y += jitdump.o
+perf-test-y += context_switch_loop.o
 
 ifeq ($(CONFIG_RUST_SUPPORT),y)
     perf-test-y += code_with_type.o
diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c
new file mode 100644
index 000000000000..5431af6147e6
--- /dev/null
+++ b/tools/perf/tests/workloads/context_switch_loop.c
@@ -0,0 +1,110 @@
+
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/compiler.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../tests.h"
+
+static int loops = 100;
+static char buf;
+int context_switch_loop_work = 1234;
+
+#define write_block(fd) \
+	do { \
+		if (write(fd, &buf, 1) <= 0) \
+			return 1; \
+	} while (0)
+
+#define read_block(fd) \
+	do { \
+		if (read(fd, &buf, 1) <= 0) \
+			return 1; \
+	} while (0)
+
+/* Not static to avoid LTO clobbering the function name */
+int context_switch_loop_proc1(int in_fd, int out_fd);
+int context_switch_loop_proc1(int in_fd, int out_fd)
+{
+	for (int i = 0; i < loops; i++) {
+		read_block(in_fd);
+		context_switch_loop_work += i * 3;
+		write_block(out_fd);
+	}
+	return 0;
+}
+
+int context_switch_loop_proc2(int in_fd, int out_fd);
+int context_switch_loop_proc2(int in_fd, int out_fd)
+{
+	for (int i = 0; i < loops; i++) {
+		write_block(out_fd);
+		context_switch_loop_work += i * 7;
+		read_block(in_fd);
+	}
+	return 0;
+}
+
+/*
+ * Launches two processes that take turns to execute a multiplication N times
+ */
+static int context_switch_loop(int argc, const char **argv)
+{
+	int a_to_b[2], b_to_a[2];
+	pid_t proc1_pid;
+	int status;
+	int ret;
+
+	if (argc > 0) {
+		loops = atoi(argv[0]);
+		if (loops < 0) {
+			fprintf(stderr, "Invalid number of loops: %s\n", argv[0]);
+			return 1;
+		}
+	}
+
+	if (pipe(a_to_b) || pipe(b_to_a)) {
+		perror("Pipe error");
+		return 1;
+	}
+
+	proc1_pid = fork();
+	if (proc1_pid < 0) {
+		perror("Fork error");
+		return 1;
+	}
+
+	if (!proc1_pid) {
+		close(a_to_b[0]);
+		close(b_to_a[1]);
+		prctl(PR_SET_NAME, "proc1", 0, 0, 0);
+		ret = context_switch_loop_proc1(b_to_a[0], a_to_b[1]);
+		close(a_to_b[1]);
+		close(b_to_a[0]);
+		exit(ret);
+	}
+
+	close(a_to_b[1]);
+	close(b_to_a[0]);
+	prctl(PR_SET_NAME, "proc2", 0, 0, 0);
+	ret = context_switch_loop_proc2(a_to_b[0], b_to_a[1]);
+	close(a_to_b[0]);
+	close(b_to_a[1]);
+
+	if (ret) {
+		kill(proc1_pid, SIGKILL);
+		return ret;
+	}
+
+	if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) ||
+	    WEXITSTATUS(status))
+		return 1;
+
+	return 0;
+}
+
+DEFINE_WORKLOAD(context_switch_loop);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 04/19] perf test cs-etm: Test process attribution
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (2 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 03/19] perf test: Add a workload that forces context switches James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 05/19] perf test: Add deterministic workload James Clark
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Run the context switch workload on one CPU and trace it to test that
symbols are attributed to the correct process and that the attribution
changes at the exact point that the context switch happened.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/context_switch_thread.sh | 69 ++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/tools/perf/tests/shell/coresight/context_switch_thread.sh b/tools/perf/tests/shell/coresight/context_switch_thread.sh
new file mode 100755
index 000000000000..2b9c44b86c59
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/context_switch_thread.sh
@@ -0,0 +1,69 @@
+#!/bin/bash -e
+# CoreSight context switch thread attribution (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If CoreSight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for "-C 0" in record command
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_context_switch.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+check_samples() {
+	owner_samples=$(grep -c "proc1.*context_switch_loop_proc1" "$tmpdir/script" || true)
+	next_samples=$(grep -c "proc2.*context_switch_loop_proc2" "$tmpdir/script" || true)
+
+	if [ "$owner_samples" -eq 0 ] || [ "$next_samples" -eq 0 ]; then
+		echo "No samples found"
+		cleanup
+		exit 1
+	fi
+
+	if grep "proc2.*context_switch_loop_proc1" "$tmpdir/script"; then
+		echo "Thread1 symbol was attributed to proc2"
+		cleanup
+		exit 1
+	fi
+
+	if grep "proc1.*context_switch_loop_proc2" "$tmpdir/script"; then
+		echo "Thread2 symbol was attributed to proc1"
+		cleanup
+		exit 1
+	fi
+}
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Pin to one CPU so the two threads alternate running but record into the same
+# trace buffer. Start disabled and use the control FIFO to only record the
+# workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -C 0 -D -1 --control fifo:"$cf","$af" -- \
+	taskset --cpu-list 0 perf test --record-ctl fifo:"$cf","$af" \
+	-w context_switch_loop > /dev/null 2>&1
+
+# Test both instruction and branch sample generation modes.
+perf script -i "$tmpdir/data" --itrace=i4 -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+perf script -i "$tmpdir/data" --itrace=b -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+
+cleanup
+exit 0

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 05/19] perf test: Add deterministic workload
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (3 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 04/19] perf test cs-etm: Test process attribution James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a workload that does the same thing every time for testing CPU trace
decoding.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt     |  4 +--
 tools/perf/tests/builtin-test.c            |  1 +
 tools/perf/tests/tests.h                   |  1 +
 tools/perf/tests/workloads/Build           |  2 ++
 tools/perf/tests/workloads/deterministic.c | 39 ++++++++++++++++++++++++++++++
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 213eb62603eb..c50a4b2d2d29 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
 --workload=::
 	Run a built-in workload, to list them use '--list-workloads', current
 	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
-	context_switch_loop and landlock.
+	context_switch_loop, deterministic and landlock.
 
 	Used with the shell script regression tests.
 
@@ -66,7 +66,7 @@ OPTIONS
 		seconds: leafloop, noploop, sqrtloop, thloop
 		nrloops: brstack, context_switch_loop
 
-	The datasym and landlock workloads don't accept any.
+	The datasym, landlock and deterministic workloads don't accept any.
 
 --list-workloads::
 	List the available workloads to use with -w/--workload.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 9284f897de3c..ef7e3f52a383 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -164,6 +164,7 @@ static struct test_workload *workloads[] = {
 	&workload__inlineloop,
 	&workload__jitdump,
 	&workload__context_switch_loop,
+	&workload__deterministic,
 
 #ifdef HAVE_RUST_SUPPORT
 	&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 7cd4da4e96d3..bcfe9c33fc66 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -246,6 +246,7 @@ DECLARE_WORKLOAD(traploop);
 DECLARE_WORKLOAD(inlineloop);
 DECLARE_WORKLOAD(jitdump);
 DECLARE_WORKLOAD(context_switch_loop);
+DECLARE_WORKLOAD(deterministic);
 
 #ifdef HAVE_RUST_SUPPORT
 DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 7134a031cb7c..90f2d8aa4941 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -11,6 +11,7 @@ perf-test-y += traploop.o
 perf-test-y += inlineloop.o
 perf-test-y += jitdump.o
 perf-test-y += context_switch_loop.o
+perf-test-y += deterministic.o
 
 ifeq ($(CONFIG_RUST_SUPPORT),y)
     perf-test-y += code_with_type.o
@@ -23,3 +24,4 @@ CFLAGS_brstack.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_datasym.o          = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_traploop.o         = -g -O0 -fno-inline -U_FORTIFY_SOURCE
 CFLAGS_inlineloop.o       = -g -O2
+CFLAGS_deterministic.o    = -g -O0 -fno-inline -U_FORTIFY_SOURCE
diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c
new file mode 100644
index 000000000000..8a78519fd075
--- /dev/null
+++ b/tools/perf/tests/workloads/deterministic.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/compiler.h>
+#include "../tests.h"
+
+int dt_work = 1234;
+
+static void function1(void)
+{
+	dt_work += 7;
+	dt_work += 7;
+	dt_work += 7;
+}
+
+static void function2(void)
+{
+	dt_work += 7;
+	dt_work += 7;
+	dt_work += 7;
+}
+
+static int deterministic(int argc __maybe_unused,
+			 const char **argv __maybe_unused)
+{
+	dt_work += 7;
+	dt_work += 7;
+	dt_work += 7;
+
+	function1();
+
+	dt_work += 7;
+	dt_work += 7;
+	dt_work += 7;
+
+	function2();
+
+	return 0;
+}
+
+DEFINE_WORKLOAD(deterministic);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (4 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 05/19] perf test: Add deterministic workload James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 17:04   ` Leo Yan
  2026-06-09 14:40 ` [PATCH v5 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Testing a long sequence without branches seems like it would be better
as a decoder unit test, and this test doesn't test decoding either, so
it's not clear what bugs this is trying to catch.

The new deterministic workload has somewhat long sequences when built
unoptimized, and we can always increase them later if we want to. But
now we test that decoding always gives the same result for the same
sequence of code which we've never had before.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/coresight/deterministic.sh  | 72 ++++++++++++++++++++++
 .../tests/shell/coresight/unroll_loop_thread_10.sh | 22 -------
 2 files changed, 72 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/deterministic.sh b/tools/perf/tests/shell/coresight/deterministic.sh
new file mode 100755
index 000000000000..75d4973056f0
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/deterministic.sh
@@ -0,0 +1,72 @@
+#!/bin/bash -e
+# CoreSight deterministic workload decode (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If CoreSight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_deterministic.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Start disabled and use the control FIFO to only record the workload and not
+# startup.
+perf record -o "$tmpdir/data" -e cs_etm//u -D -1 --control fifo:"$cf","$af" -- \
+	perf test --record-ctl fifo:"$cf","$af" -w deterministic > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" --itrace=i1i -F ip,srcline | \
+	grep "deterministic.c" | uniq > "$tmpdir/script" 2>/dev/null
+
+
+# Remove brace lines and call sites as they may not be hit or may have
+# extra hits after returning, depending on the compiler.
+sed -i \
+  -e '/deterministic.c:8$/d' \
+  -e '/deterministic.c:12$/d' \
+  -e '/deterministic.c:15$/d' \
+  -e '/deterministic.c:19$/d' \
+  -e '/deterministic.c:23$/d' \
+  -e '/deterministic.c:28$/d' \
+  -e '/deterministic.c:34$/d' \
+  -e '/deterministic.c:36$/d' \
+  -e '/deterministic.c:37$/d' \
+  "$tmpdir/script"
+
+cat > "$tmpdir/expected" << EOF
+  deterministic.c:24
+  deterministic.c:25
+  deterministic.c:26
+  deterministic.c:9
+  deterministic.c:10
+  deterministic.c:11
+  deterministic.c:30
+  deterministic.c:31
+  deterministic.c:32
+  deterministic.c:16
+  deterministic.c:17
+  deterministic.c:18
+EOF
+
+if ! diff -q "$tmpdir/script" "$tmpdir/expected"; then
+	echo "FAIL: line numbers don't match expected: "
+	head -n 100 "$tmpdir/script"
+	cleanup
+	exit 1
+fi
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh b/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
deleted file mode 100755
index cb3e97a0a89f..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Unroll Loop Thread 10 (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="unroll_loop_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10"
-DATV="10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 07/19] perf test cs-etm: Remove asm_pure_loop test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (5 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

It's not obvious what this test is for so remove it. It's not a stress
test because it doesn't output lots of data and it's not a functional
test because it only looks for raw trace output. It seems to imply that
a program written in assembly influences whether trace would be
generated by the CPU or not, but the CPU doesn't know what language the
program is written in.

We already have lots of Coresight tests that test the full pipeline
including decoding, and in many more modes of operation than this one,
so if no trace was collected they will already fail leaving this one
redundant.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop.sh b/tools/perf/tests/shell/coresight/asm_pure_loop.sh
deleted file mode 100755
index 0301904b9637..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / ASM Pure Loop (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="asm_pure_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS=""
-DATV="out"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (6 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 17:16   ` Leo Yan
  2026-06-09 14:40 ` [PATCH v5 09/19] perf test: Add named_threads workload James Clark
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Like asm_pure_loop, this memcpy test only checks that 10 of each of a
few trace packet types occur after recording a lot of trace, which isn't
more specific than other existing Coresight tests.

Assume it was supposed to be a stress test for dumping and replace it
with one that doesn't require a custom binary and checks for a specific
amount of raw output. Don't bother checking for packets because the
other tests that test decoding will catch issues with malformed data.

This also adds coverage for exit snapshot mode which was missing.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/memcpy_thread_16k_10.sh  | 22 --------
 .../perf/tests/shell/coresight/raw_dump_stress.sh  | 65 ++++++++++++++++++++++
 2 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh b/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
deleted file mode 100755
index 1f765d69acc3..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Memcpy 16k 10 Threads (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="memcpy_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="16 10 1"
-DATV="16k_10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/raw_dump_stress.sh b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
new file mode 100755
index 000000000000..bea70d825596
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
@@ -0,0 +1,65 @@
+#!/bin/bash -e
+# CoreSight raw dump stress (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for larger buffer size
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
+# If CoreSight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_raw_dump_stress.XXXXX)
+
+cleanup() {
+	rm -r "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+# Use exit snapshot to record 2M of trace to make about 80MB of raw dump data.
+echo "Recording..."
+perf record -e cs_etm/timestamp=0/u -m,2M -Se -o "$tmpdir/data" -- \
+	perf test -w brstack 20000 > /dev/null 2>&1
+
+# Test raw dump runs to completion but don't decode because that's too slow for
+# a test
+echo "Dumping raw trace..."
+perf report --dump-raw-trace -i "$tmpdir/data" 2>/dev/null > "$tmpdir/rawdump"
+
+# Get the size and offset of the first AUXTRACE buffer and the index of the last
+# packet in the raw dump.
+read -r size offset last_idx <<< "$(awk '
+	found && /PERF_RECORD_/ { exit }
+	/PERF_RECORD_AUXTRACE / { found = 1; size = $7; offset = $9; next }
+	found && /Idx:/ { last_idx = $1; gsub(/Idx:|;/, "", last_idx) }
+	END { if (last_idx) print size, offset, last_idx }
+' "$tmpdir/rawdump")"
+
+# The last Idx minus start offset should equal the size of the buffer if
+# everything was dumped. Allow 48 bytes difference to cover 3 frames: current
+# frame length, a partial frame and a final empty one, all of which aren't
+# dumped.
+#
+# TODO: for a single snapshot, offset should always be zero. However, we
+# currently output AUX records in snapshot mode when we shouldn't, which
+# increments the offset. Allow for that until it's fixed so we can test raw
+# dumping.
+decode_size=$((1 + last_idx - offset))
+if [ "$decode_size" -gt "$((size - 48))" ] && [ "$decode_size" -le "$((size))" ]; then
+	echo "PASS: AUXTRACE buffer length matches dumped packet index"
+	cleanup
+	exit 0
+fi
+
+echo "FAIL: AUXTRACE buffer length mismatch: size=$size offset=$offset last_idx=$last_idx decode_size=$decode_size"
+cleanup
+exit 1

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 09/19] perf test: Add named_threads workload
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (7 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 16:00   ` sashiko-bot
  2026-06-09 14:40 ` [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Add a workload that runs X threads that run a unique function named
"named_threads_thread[x]" which performs a multiplication in a loop for
Y loops. Each thread sets its name to "thread[x]".

This can be used to test that processor trace decoding handles
concurrent threads correctly and the correct symbols and thread names
are assigned to samples.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/Documentation/perf-test.txt     |   5 +-
 tools/perf/tests/builtin-test.c            |   1 +
 tools/perf/tests/tests.h                   |   1 +
 tools/perf/tests/workloads/Build           |   1 +
 tools/perf/tests/workloads/named_threads.c | 109 +++++++++++++++++++++++++++++
 5 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index c50a4b2d2d29..81c8525f5946 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
 --workload=::
 	Run a built-in workload, to list them use '--list-workloads', current
 	ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
-	context_switch_loop, deterministic and landlock.
+	context_switch_loop, deterministic, named_threads and landlock.
 
 	Used with the shell script regression tests.
 
@@ -66,6 +66,9 @@ OPTIONS
 		seconds: leafloop, noploop, sqrtloop, thloop
 		nrloops: brstack, context_switch_loop
 
+	'named_threads' accepts the number of threads and the number of loops to
+	do in each thread.
+
 	The datasym, landlock and deterministic workloads don't accept any.
 
 --list-workloads::
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index ef7e3f52a383..afc06cec4954 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -155,6 +155,7 @@ static struct test_suite *generic_tests[] = {
 static struct test_workload *workloads[] = {
 	&workload__noploop,
 	&workload__thloop,
+	&workload__named_threads,
 	&workload__leafloop,
 	&workload__sqrtloop,
 	&workload__brstack,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bcfe9c33fc66..7cedf05be544 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -237,6 +237,7 @@ struct test_workload workload__##work = {	\
 /* The list of test workloads */
 DECLARE_WORKLOAD(noploop);
 DECLARE_WORKLOAD(thloop);
+DECLARE_WORKLOAD(named_threads);
 DECLARE_WORKLOAD(leafloop);
 DECLARE_WORKLOAD(sqrtloop);
 DECLARE_WORKLOAD(brstack);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 90f2d8aa4941..75b377934a0e 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -2,6 +2,7 @@
 
 perf-test-y += noploop.o
 perf-test-y += thloop.o
+perf-test-y += named_threads.o
 perf-test-y += leafloop.o
 perf-test-y += sqrtloop.o
 perf-test-y += brstack.o
diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
new file mode 100644
index 000000000000..d051d41a3cfe
--- /dev/null
+++ b/tools/perf/tests/workloads/named_threads.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <limits.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/compiler.h>
+#include "../tests.h"
+
+#define MAX_THREADS 25
+
+static int iterations = 500;
+int named_threads_work = 1234;
+
+typedef void *(*thread_fn_t)(void *);
+
+#define DEFINE_THREAD(n)						\
+noinline void *named_threads_thread##n(void *arg __maybe_unused)	\
+{									\
+	pthread_setname_np(pthread_self(), "thread" #n);		\
+	for (int i = 0; i < iterations; i++)				\
+		named_threads_work += 3;				\
+									\
+	return NULL;							\
+}
+
+#define THREAD_LIST(macro)	\
+	macro(1)		\
+	macro(2)		\
+	macro(3)		\
+	macro(4)		\
+	macro(5)		\
+	macro(6)		\
+	macro(7)		\
+	macro(8)		\
+	macro(9)		\
+	macro(10)		\
+	macro(11)		\
+	macro(12)		\
+	macro(13)		\
+	macro(14)		\
+	macro(15)		\
+	macro(16)		\
+	macro(17)		\
+	macro(18)		\
+	macro(19)		\
+	macro(20)		\
+	macro(21)		\
+	macro(22)		\
+	macro(23)		\
+	macro(24)		\
+	macro(25)
+
+#define DECLARE_THREAD(n) void *named_threads_thread##n(void *arg);
+
+THREAD_LIST(DECLARE_THREAD)
+THREAD_LIST(DEFINE_THREAD)
+
+#define THREAD_ENTRY(n) named_threads_thread##n,
+
+static thread_fn_t thread_fns[MAX_THREADS] = {
+	THREAD_LIST(THREAD_ENTRY)
+};
+
+/*
+ * Creates argv[0] threads that run a unique function named "thread[x]" which performs
+ * a multiplication in a loop for argv[1] loops.
+ */
+static int named_threads(int argc, const char **argv)
+{
+	pthread_t threads[MAX_THREADS];
+	int nr_threads = 1;
+	int err = 0;
+
+	if (argc > 0)
+		nr_threads = atoi(argv[0]);
+
+	if (nr_threads <= 0 || nr_threads > MAX_THREADS) {
+		fprintf(stderr, "Error: num threads must be 1 - %d\n", MAX_THREADS);
+		return 1;
+	}
+
+	if (argc > 1)
+		iterations = atoi(argv[1]);
+
+	if (iterations < 0) {
+		fprintf(stderr, "Error: iterations must be non-negative\n");
+		return 1;
+	}
+
+	for (int i = 0; i < nr_threads; i++) {
+		int ret;
+
+		ret = pthread_create(&threads[i], NULL, thread_fns[i], NULL);
+		if (ret) {
+			fprintf(stderr, "Error: failed to create thread%d: %s\n",
+				i + 1, strerror(ret));
+			return 1;
+		}
+	}
+
+	for (int i = 0; i < nr_threads; i++)
+		pthread_join(threads[i], NULL);
+
+	return err;
+}
+
+DEFINE_WORKLOAD(named_threads);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (8 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 09/19] perf test: Add named_threads workload James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 17:18   ` Leo Yan
  2026-06-09 14:40 ` [PATCH v5 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

The thread_loop test only looks for context IDs in the raw trace.
There's a lot more that can go wrong when decoding these, so replace it
with a test that looks at the final output for matching thread names and
symbols.

In the future we might use timestamps and context switch events to track
threads, so looking at context IDs in the raw trace wouldn't always
work.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 .../tests/shell/coresight/concurrent_threads.sh    | 45 ++++++++++++++++++++++
 .../shell/coresight/thread_loop_check_tid_10.sh    | 23 -----------
 .../shell/coresight/thread_loop_check_tid_2.sh     | 23 -----------
 3 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/tools/perf/tests/shell/coresight/concurrent_threads.sh b/tools/perf/tests/shell/coresight/concurrent_threads.sh
new file mode 100755
index 000000000000..3349fff8c767
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/concurrent_threads.sh
@@ -0,0 +1,45 @@
+#!/bin/bash -e
+# CoreSight concurrent threads (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If CoreSight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_concurrent_threads.XXXXX)
+
+cleanup() {
+	rm -rf "${tmpdir}"
+	trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+	cleanup
+	exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+nthreads=10
+
+# Timestamps off to reduce trace size, start disabled and use the control FIFO
+# to only record the workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -D -1 --control fifo:"$cf","$af" \
+	-- perf test --record-ctl fifo:"$cf","$af" -w named_threads $nthreads 1 > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" > "$tmpdir/script" 2>/dev/null
+
+# Check all threads were traced and they have the correct thread name and symbol
+for i in $(seq 1 $nthreads); do
+	if ! grep -q "thread${i} .* named_threads_thread${i}" "$tmpdir/script"; then
+		echo "Error: thread${i} missing" >&2
+		cleanup
+		exit 1
+	fi
+done
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
deleted file mode 100755
index 7f43a93a2ac2..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 10 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10 1"
-DATV="check-tid-10th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
deleted file mode 100755
index a94d2079ed06..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 2 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="2 20"
-DATV="check-tid-2th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 11/19] perf test cs-etm: Remove duplicate branch tests
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (9 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 12/19] perf test cs-etm: Skip if not root James Clark
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

We already test branch output in perf script mode, but then retest it in
Perf report mode. This is more of a test of Perf itself than Coresight
because Perf uses the same samples to generate both outputs. Also we're
already testing instruction output in Perf report mode.

Remove this test for a speedup. On the systemwide test also remove the
Perf report test because systemwide mode records a lot more data so
running multiple tests on it has a big runtime impact.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index bbf89e944e7b..39553702c1f3 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -52,17 +52,6 @@ perf_script_branch_samples() {
 		grep -E " +$1 +[0-9]+ .* +branches:(.*:)? +" > /dev/null 2>&1
 }
 
-perf_report_branch_samples() {
-	echo "Looking at perf.data file for reporting branch samples:"
-
-	# Below is an example of the branch samples reporting:
-	#   73.04%    73.04%  touch    libc-2.27.so      [.] _dl_addr
-	#    7.71%     7.71%  touch    libc-2.27.so      [.] getenv
-	#    2.59%     2.59%  touch    ld-2.27.so        [.] strcmp
-	perf report --stdio -i ${perfdata} 2>&1 | \
-		grep -E " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
-}
-
 perf_report_instruction_samples() {
 	echo "Looking at perf.data file for instruction samples:"
 
@@ -123,7 +112,6 @@ arm_cs_iterate_devices() {
 
 			record_touch_file $device_name $2 &&
 			perf_script_branch_samples touch &&
-			perf_report_branch_samples touch &&
 			perf_report_instruction_samples touch
 
 			err=$?
@@ -154,9 +142,7 @@ arm_cs_etm_system_wide_test() {
 
 	# System-wide mode should include perf samples so test for that
 	# instead of ls
-	perf_script_branch_samples perf &&
-	perf_report_branch_samples perf &&
-	perf_report_instruction_samples perf
+	perf_script_branch_samples perf
 
 	err=$?
 	arm_cs_report "CoreSight system wide testing" $err
@@ -179,7 +165,6 @@ arm_cs_etm_snapshot_test() {
 	wait $PERFPID
 
 	perf_script_branch_samples dd &&
-	perf_report_branch_samples dd &&
 	perf_report_instruction_samples dd
 
 	err=$?
@@ -191,7 +176,6 @@ arm_cs_etm_basic_test() {
 	perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
 
 	perf_script_branch_samples ls &&
-	perf_report_branch_samples ls &&
 	perf_report_instruction_samples ls
 
 	err=$?

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 12/19] perf test cs-etm: Skip if not root
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (10 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 16:13   ` sashiko-bot
  2026-06-09 14:40 ` [PATCH v5 13/19] perf test cs-etm: Reduce snapshot size James Clark
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Use the common idiom for skipping tests if not running as root, which is
required for these tests.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh        | 6 ++++++
 tools/perf/tests/shell/test_arm_coresight_disasm.sh | 6 +++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 39553702c1f3..8ed2c934c87d 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
 
 skip_if_no_cs_etm_event || exit 2
 
+if [ "$(id -u)" != 0 ]; then
+	# Requires root for -C and system wide tests
+	echo "[Skip] No root permission"
+	exit 2
+fi
+
 perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
 file=$(mktemp /tmp/temporary_file.XXXXX)
 
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 0dfb4fadf531..339ae4831868 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -42,7 +42,7 @@ sep="\s\|\s"
 branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s"
 
 ## Test kernel ##
-if [ -e /proc/kcore ]; then
+if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
 	echo "Testing kernel disassembly"
 	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
@@ -50,8 +50,8 @@ if [ -e /proc/kcore ]; then
 	grep -q -e ${branch_search} ${file}
 	echo "Found kernel branches"
 else
-	# kcore is required for correct kernel decode due to runtime code patching
-	echo "No kcore, skipping kernel test"
+	# Root and kcore are required for correct kernel decode due to runtime code patching
+	echo "No root or kcore, skipping kernel test"
 fi
 
 ## Test user ##

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 13/19] perf test cs-etm: Reduce snapshot size
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (11 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 12/19] perf test cs-etm: Skip if not root James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 14/19] perf test cs-etm: Speed up basic test James Clark
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

The default buffer size for root is 4MB which is very slow to decode. We
only need a few KB to verify that the dd process is hit so reduce the
size to 128KB.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 8ed2c934c87d..da2f599393e2 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -156,7 +156,7 @@ arm_cs_etm_system_wide_test() {
 
 arm_cs_etm_snapshot_test() {
 	echo "Recording trace with snapshot mode"
-	perf record -o ${perfdata} -e cs_etm// -S \
+	perf record -o ${perfdata} -e cs_etm// -S -m,128K \
 		-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
 	PERFPID=$!
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 14/19] perf test cs-etm: Speed up basic test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (12 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 13/19] perf test cs-etm: Reduce snapshot size James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Like the name says, this should be the most basic test possible. Kernel
recording is slow and already has coverage on the systemwide test. Perf
report output also has coverage elsewhere. 'ls' also produces more trace
than 'true'.

We only want to test if the combination of recording options works at
all, so fix all of these things to make it as fast as possible.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight.sh | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index da2f599393e2..83295a8fe179 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -179,10 +179,9 @@ arm_cs_etm_snapshot_test() {
 
 arm_cs_etm_basic_test() {
 	echo "Recording trace with '$*'"
-	perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
+	perf record -o ${perfdata} "$@" -- true > /dev/null 2>&1
 
-	perf_script_branch_samples ls &&
-	perf_report_instruction_samples ls
+	perf_script_branch_samples true
 
 	err=$?
 	arm_cs_report "CoreSight basic testing with '$*'" $err
@@ -246,12 +245,12 @@ arm_cs_etm_snapshot_test
 
 # Test all combinations of per-thread, system-wide and normal mode with
 # and without timestamps
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u
 
 arm_cs_etm_sparse_cpus_test
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (13 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 14/19] perf test cs-etm: Speed up basic test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 17:22   ` Leo Yan
  2026-06-09 14:40 ` [PATCH v5 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

These are now unused and had various issues like not working with out of
source builds and being slow to compile. Delete them.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 Documentation/trace/coresight/coresight-perf.rst   |  78 +-----------
 MAINTAINERS                                        |   1 -
 tools/perf/Makefile.perf                           |  14 +--
 tools/perf/tests/shell/coresight/Makefile          |  29 -----
 .../perf/tests/shell/coresight/Makefile.miniconfig |  14 ---
 .../tests/shell/coresight/asm_pure_loop/.gitignore |   1 -
 .../tests/shell/coresight/asm_pure_loop/Makefile   |  34 ------
 .../shell/coresight/asm_pure_loop/asm_pure_loop.S  |  30 -----
 .../tests/shell/coresight/memcpy_thread/.gitignore |   1 -
 .../tests/shell/coresight/memcpy_thread/Makefile   |  33 -----
 .../shell/coresight/memcpy_thread/memcpy_thread.c  |  80 ------------
 .../tests/shell/coresight/thread_loop/.gitignore   |   1 -
 .../tests/shell/coresight/thread_loop/Makefile     |  33 -----
 .../shell/coresight/thread_loop/thread_loop.c      |  85 -------------
 .../shell/coresight/unroll_loop_thread/.gitignore  |   1 -
 .../shell/coresight/unroll_loop_thread/Makefile    |  33 -----
 .../unroll_loop_thread/unroll_loop_thread.c        |  75 ------------
 tools/perf/tests/shell/lib/coresight.sh            | 134 ---------------------
 18 files changed, 5 insertions(+), 672 deletions(-)

diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst
index 30be89320621..0a77741a431e 100644
--- a/Documentation/trace/coresight/coresight-perf.rst
+++ b/Documentation/trace/coresight/coresight-perf.rst
@@ -112,78 +112,6 @@ Example for triggering AUX pause and resume with PMU event::
 Perf test - Verify kernel and userspace perf CoreSight work
 -----------------------------------------------------------
 
-When you run perf test, it will do a lot of self tests. Some of those
-tests will cover CoreSight (only if enabled and on ARM64). You
-generally would run perf test from the tools/perf directory in the
-kernel tree. Some tests will check some internal perf support like:
-
-   Check Arm CoreSight trace data recording and synthesized samples
-   Check Arm SPE trace data recording and synthesized samples
-
-Some others will actually use perf record and some test binaries that
-are in tests/shell/coresight and will collect traces to ensure a
-minimum level of functionality is met. The scripts that launch these
-tests are in the same directory. These will all look like:
-
-   CoreSight / ASM Pure Loop
-   CoreSight / Memcpy 16k 10 Threads
-   CoreSight / Thread Loop 10 Threads - Check TID
-   etc.
-
-These perf record tests will not run if the tool binaries do not exist
-in tests/shell/coresight/\*/ and will be skipped. If you do not have
-CoreSight support in hardware then either do not build perf with
-CoreSight support or remove these binaries in order to not have these
-tests fail and have them skip instead.
-
-These tests will log historical results in the current working
-directory (e.g. tools/perf) and will be named stats-\*.csv like:
-
-   stats-asm_pure_loop-out.csv
-   stats-memcpy_thread-16k_10.csv
-   ...
-
-These statistic files log some aspects of the AUX data sections in
-the perf data output counting some numbers of certain encodings (a
-good way to know that it's working in a very simple way). One problem
-with CoreSight is that given a large enough amount of data needing to
-be logged, some of it can be lost due to the processor not waking up
-in time to read out all the data from buffers etc.. You will notice
-that the amount of data collected can vary a lot per run of perf test.
-If you wish to see how this changes over time, simply run perf test
-multiple times and all these csv files will have more and more data
-appended to it that you can later examine, graph and otherwise use to
-figure out if things have become worse or better.
-
-This means sometimes these tests fail as they don't capture all the
-data needed. This is about tracking quality and amount of data
-produced over time and to see when changes to the Linux kernel improve
-quality of traces.
-
-Be aware that some of these tests take quite a while to run, specifically
-in processing the perf data file and dumping contents to then examine what
-is inside.
-
-You can change where these csv logs are stored by setting the
-PERF_TEST_CORESIGHT_STATDIR environment variable before running perf
-test like::
-
-   export PERF_TEST_CORESIGHT_STATDIR=/var/tmp
-   perf test
-
-They will also store resulting perf output data in the current
-directory for later inspection like::
-
-   perf-asm_pure_loop-out.data
-   perf-memcpy_thread-16k_10.data
-   ...
-
-You can alter where the perf data files are stored by setting the
-PERF_TEST_CORESIGHT_DATADIR environment variable such as::
-
-   PERF_TEST_CORESIGHT_DATADIR=/var/tmp
-   perf test
-
-You may wish to set these above environment variables if you wish to
-keep the output of tests outside of the current working directory for
-longer term storage and examination.
+There are a set of Perf tests for CoreSight which can be run with::
+
+  sudo perf test coresight
diff --git a/MAINTAINERS b/MAINTAINERS
index b539be153f6a..7efb893edcbb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2751,7 +2751,6 @@ F:	tools/perf/arch/arm/util/cs-etm.h
 F:	tools/perf/arch/arm/util/pmu.c
 F:	tools/perf/tests/shell/*coresight*
 F:	tools/perf/tests/shell/coresight/*
-F:	tools/perf/tests/shell/lib/*coresight*
 F:	tools/perf/util/cs-etm-decoder/*
 F:	tools/perf/util/cs-etm.*
 
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index ab661a1d271c..f29c267bf842 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -508,16 +508,7 @@ arm64-sysreg-defs-clean:
 	$(Q)$(MAKE) -C $(arm64_gen_sysreg_dir) O=$(arm64_gen_sysreg_outdir) \
 		prefix= subdir= clean > /dev/null
 
-TESTS_CORESIGHT_DIR := $(srctree)/tools/perf/tests/shell/coresight
-
-tests-coresight-targets: FORCE
-	$(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR)
-
-tests-coresight-targets-clean:
-	$(call QUIET_CLEAN, coresight)
-	$(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR) O=$(OUTPUT) clean >/dev/null
-
-all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) tests-coresight-targets
+all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS)
 
 # Create python binding output directory if not already present
 $(shell [ -d '$(OUTPUT)python' ] || mkdir -p '$(OUTPUT)python')
@@ -896,7 +887,6 @@ install-tests: all install-gtk
 		$(INSTALL) tests/shell/base_report/*.txt '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/base_report'; \
 		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight' ; \
 		$(INSTALL) tests/shell/coresight/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight'
-	$(Q)$(MAKE) -C tests/shell/coresight install-tests
 
 install-bin: install-tools install-tests
 
@@ -939,7 +929,7 @@ endif
 
 clean:: $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBSYMBOL)-clean $(LIBPERF)-clean \
 		arm64-sysreg-defs-clean fixdep-clean python-clean bpf-skel-clean \
-		tests-coresight-targets-clean pmu-events-clean
+		pmu-events-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive \
 		$(OUTPUT)perf-iostat $(LANG_BINDINGS)
 	$(Q)find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '*.a' -delete -o \
diff --git a/tools/perf/tests/shell/coresight/Makefile b/tools/perf/tests/shell/coresight/Makefile
deleted file mode 100644
index fa08fd9a5991..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile
+++ /dev/null
@@ -1,29 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../../../../../tools/scripts/Makefile.include
-include ../../../../../tools/scripts/Makefile.arch
-include ../../../../../tools/scripts/utilities.mak
-
-SUBDIRS = \
-	asm_pure_loop \
-	memcpy_thread \
-	thread_loop \
-	unroll_loop_thread
-
-all: $(SUBDIRS)
-$(SUBDIRS):
-	@$(MAKE) -C $@ >/dev/null
-
-INSTALLDIRS = $(SUBDIRS:%=install-%)
-
-install-tests: $(INSTALLDIRS)
-$(INSTALLDIRS):
-	@$(MAKE) -C $(@:install-%=%) install-tests >/dev/null
-
-CLEANDIRS = $(SUBDIRS:%=clean-%)
-
-clean: $(CLEANDIRS)
-$(CLEANDIRS):
-	$(call QUIET_CLEAN, test-$(@:clean-%=%)) $(MAKE) -C $(@:clean-%=%) clean >/dev/null
-
-.PHONY: all clean $(SUBDIRS) $(CLEANDIRS) $(INSTALLDIRS)
diff --git a/tools/perf/tests/shell/coresight/Makefile.miniconfig b/tools/perf/tests/shell/coresight/Makefile.miniconfig
deleted file mode 100644
index 5f72a9cb43f3..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile.miniconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-ifndef DESTDIR
-prefix ?= $(HOME)
-endif
-
-DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
-INSTALL = install
-INSTDIR_SUB = tests/shell/coresight
-
-include ../../../../../scripts/Makefile.include
-include ../../../../../scripts/Makefile.arch
-include ../../../../../scripts/utilities.mak
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore b/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
deleted file mode 100644
index 468673ac32e8..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-asm_pure_loop
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile b/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
deleted file mode 100644
index 206849e92bc9..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
+++ /dev/null
@@ -1,34 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=asm_pure_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=
-
-all: $(BIN)
-
-$(BIN): $(BIN).S
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line - this is raw asm with no libc to have an always exact binary
-	$(Q)$(CC) $(BIN).S -nostdlib -static -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S b/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
deleted file mode 100644
index 577760046772..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Tamas Zsoldos <tamas.zsoldos@arm.com>, 2021 */
-
-.globl _start
-_start:
-	mov	x0, 0x0000ffff
-	mov	x1, xzr
-loop:
-	nop
-	nop
-	cbnz	x1, noskip
-	nop
-	nop
-	adrp	x2, skip
-	add 	x2, x2, :lo12:skip
-	br	x2
-	nop
-	nop
-noskip:
-	nop
-	nop
-skip:
-	sub	x0, x0, 1
-	cbnz	x0, loop
-
-	mov	x0, #0
-	mov	x8, #93 // __NR_exit syscall
-	svc	#0
-
-.section .note.GNU-stack, "", @progbits
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore b/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
deleted file mode 100644
index f8217e56091e..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-memcpy_thread
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile b/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
deleted file mode 100644
index 2db637eb2c26..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=memcpy_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c b/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
deleted file mode 100644
index 7e879217be30..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
+++ /dev/null
@@ -1,80 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
-	unsigned long loops;
-	unsigned long size;
-	pthread_t th;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	unsigned long i, len = a->loops;
-	unsigned char *src, *dst;
-
-	src = malloc(a->size * 1024);
-	dst = malloc(a->size * 1024);
-	if ((!src) || (!dst)) {
-		printf("ERR: Can't allocate memory\n");
-		exit(1);
-	}
-	for (i = 0; i < len; i++)
-		memcpy(dst, src, a->size * 1024);
-
-	return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned long i, len, size, thr;
-	struct args args[256];
-	long long v;
-
-	if (argc < 4) {
-		printf("ERR: %s [copysize Kb] [numthreads] [numloops (hundreds)]\n", argv[0]);
-		exit(1);
-	}
-
-	v = atoll(argv[1]);
-	if ((v < 1) || (v > (1024 * 1024))) {
-		printf("ERR: max memory 1GB (1048576 KB)\n");
-		exit(1);
-	}
-	size = v;
-	thr = atol(argv[2]);
-	if ((thr < 1) || (thr > 256)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	v = atoll(argv[3]);
-	if ((v < 1) || (v > 40000000000ll)) {
-		printf("ERR: loops 1-40000000000 (hundreds)\n");
-		exit(1);
-	}
-	len = v * 100;
-	for (i = 0; i < thr; i++) {
-		args[i].loops = len;
-		args[i].size = size;
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/thread_loop/.gitignore b/tools/perf/tests/shell/coresight/thread_loop/.gitignore
deleted file mode 100644
index 6d4c33eaa9e8..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-thread_loop
diff --git a/tools/perf/tests/shell/coresight/thread_loop/Makefile b/tools/perf/tests/shell/coresight/thread_loop/Makefile
deleted file mode 100644
index ea846c038e7a..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=thread_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c b/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
deleted file mode 100644
index 86f3f548b006..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
+++ /dev/null
@@ -1,85 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-// define this for gettid()
-#define _GNU_SOURCE
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-#include <sys/syscall.h>
-#ifndef SYS_gettid
-// gettid is 178 on arm64
-# define SYS_gettid 178
-#endif
-#define gettid() syscall(SYS_gettid)
-
-struct args {
-	unsigned int loops;
-	pthread_t th;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	int i = 0, len = a->loops;
-
-	if (getenv("SHOW_TID")) {
-		unsigned long long tid = gettid();
-
-		printf("%llu\n", tid);
-	}
-	asm volatile(
-		"loop:\n"
-		"add %w[i], %w[i], #1\n"
-		"cmp %w[i], %w[len]\n"
-		"blt loop\n"
-		: /* out */
-		: /* in */ [i] "r" (i), [len] "r" (len)
-		: /* clobber */
-	);
-	return (void *)(long)i;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned int i, len, thr;
-	struct args args[256];
-
-	if (argc < 3) {
-		printf("ERR: %s [numthreads] [numloops (millions)]\n", argv[0]);
-		exit(1);
-	}
-
-	thr = atoi(argv[1]);
-	if ((thr < 1) || (thr > 256)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	len = atoi(argv[2]);
-	if ((len < 1) || (len > 4000)) {
-		printf("ERR: max loops 4000 (millions)\n");
-		exit(1);
-	}
-	len *= 1000000;
-	for (i = 0; i < thr; i++) {
-		args[i].loops = len;
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore b/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
deleted file mode 100644
index 2cb4e996dbf3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-unroll_loop_thread
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile b/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
deleted file mode 100644
index 6264c4e3abd1..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=unroll_loop_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
-	$(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
-	$(call QUIET_INSTALL, tests) \
-		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
-		$(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
-	$(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c b/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
deleted file mode 100644
index 8f4e1c985ca3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
+++ /dev/null
@@ -1,75 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
-	pthread_t th;
-	unsigned int in;
-	void *ret;
-};
-
-static void *thrfn(void *arg)
-{
-	struct args *a = arg;
-	unsigned int i, in = a->in;
-
-	for (i = 0; i < 10000; i++) {
-		asm volatile (
-// force an unroll of thia add instruction so we can test long runs of code
-#define SNIP1 "add %w[in], %w[in], #1\n"
-// 10
-#define SNIP2 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1
-// 100
-#define SNIP3 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2
-// 1000
-#define SNIP4 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3
-// 10000
-#define SNIP5 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4
-// 100000
-			SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5
-			: /* out */
-			: /* in */ [in] "r" (in)
-			: /* clobber */
-		);
-	}
-
-	return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
-	pthread_t t;
-	pthread_attr_t attr;
-
-	pthread_attr_init(&attr);
-	pthread_create(&t, &attr, fn, arg);
-	return t;
-}
-
-int main(int argc, char **argv)
-{
-	unsigned int i, thr;
-	struct args args[256];
-
-	if (argc < 2) {
-		printf("ERR: %s [numthreads]\n", argv[0]);
-		exit(1);
-	}
-
-	thr = atoi(argv[1]);
-	if ((thr > 256) || (thr < 1)) {
-		printf("ERR: threads 1-256\n");
-		exit(1);
-	}
-	for (i = 0; i < thr; i++) {
-		args[i].in = rand();
-		args[i].th = new_thr(thrfn, &(args[i]));
-	}
-	for (i = 0; i < thr; i++)
-		pthread_join(args[i].th, &(args[i].ret));
-	return 0;
-}
diff --git a/tools/perf/tests/shell/lib/coresight.sh b/tools/perf/tests/shell/lib/coresight.sh
deleted file mode 100644
index 184d62e7e5bd..000000000000
--- a/tools/perf/tests/shell/lib/coresight.sh
+++ /dev/null
@@ -1,134 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-# This is sourced from a driver script so no need for #!/bin... etc. at the
-# top - the assumption below is that it runs as part of sourcing after the
-# test sets up some basic env vars to say what it is.
-
-# This currently works with ETMv4 / ETF not any other packet types at thi
-# point. This will need changes if that changes.
-
-# perf record options for the perf tests to use
-PERFRECMEM="-m ,16M"
-PERFRECOPT="$PERFRECMEM -e cs_etm//u"
-
-TOOLS=$(dirname $0)
-DIR="$TOOLS/$TEST"
-BIN="$DIR/$TEST"
-# If the test tool/binary does not exist and is executable then skip the test
-if ! test -x "$BIN"; then exit 2; fi
-# If CoreSight is not available, skip the test
-perf list pmu | grep -q cs_etm || exit 2
-DATD="."
-# If the data dir env is set then make the data dir use that instead of ./
-if test -n "$PERF_TEST_CORESIGHT_DATADIR"; then
-	DATD="$PERF_TEST_CORESIGHT_DATADIR";
-fi
-# If the stat dir env is set then make the data dir use that instead of ./
-STATD="."
-if test -n "$PERF_TEST_CORESIGHT_STATDIR"; then
-	STATD="$PERF_TEST_CORESIGHT_STATDIR";
-fi
-
-# Called if the test fails - error code 1
-err() {
-	echo "$1"
-	exit 1
-}
-
-# Check that some statistics from our perf
-check_val_min() {
-	STATF="$4"
-	if test "$2" -lt "$3"; then
-		echo ", FAILED" >> "$STATF"
-		err "Sanity check number of $1 is too low ($2 < $3)"
-	fi
-}
-
-perf_dump_aux_verify() {
-	# Some basic checking that the AUX chunk contains some sensible data
-	# to see that we are recording something and at least a minimum
-	# amount of it. We should almost always see Fn packets in just about
-	# anything but certainly we will see some trace info and async
-	# packets
-	DUMP="$DATD/perf-tmp-aux-dump.txt"
-	perf report --stdio --dump -i "$1" | \
-		grep -o -e I_ATOM_F -e I_ASYNC -e I_TRACE_INFO > "$DUMP"
-	# Simply count how many of these packets we find to see that we are
-	# producing a reasonable amount of data - exact checks are not sane
-	# as this is a lossy process where we may lose some blocks and the
-	# compiler may produce different code depending on the compiler and
-	# optimization options, so this is rough just to see if we're
-	# either missing almost all the data or all of it
-	ATOM_FX_NUM=$(grep -c I_ATOM_F "$DUMP")
-	ASYNC_NUM=$(grep -c I_ASYNC "$DUMP")
-	TRACE_INFO_NUM=$(grep -c I_TRACE_INFO "$DUMP")
-	rm -f "$DUMP"
-
-	# Arguments provide minimums for a pass
-	CHECK_FX_MIN="$2"
-	CHECK_ASYNC_MIN="$3"
-	CHECK_TRACE_INFO_MIN="$4"
-
-	# Write out statistics, so over time you can track results to see if
-	# there is a pattern - for example we have less "noisy" results that
-	# produce more consistent amounts of data each run, to see if over
-	# time any techinques to  minimize data loss are having an effect or
-	# not
-	STATF="$STATD/stats-$TEST-$DATV.csv"
-	if ! test -f "$STATF"; then
-		echo "ATOM Fx Count, Minimum, ASYNC Count, Minimum, TRACE INFO Count, Minimum" > "$STATF"
-	fi
-	echo -n "$ATOM_FX_NUM, $CHECK_FX_MIN, $ASYNC_NUM, $CHECK_ASYNC_MIN, $TRACE_INFO_NUM, $CHECK_TRACE_INFO_MIN" >> "$STATF"
-
-	# Actually check to see if we passed or failed.
-	check_val_min "ATOM_FX" "$ATOM_FX_NUM" "$CHECK_FX_MIN" "$STATF"
-	check_val_min "ASYNC" "$ASYNC_NUM" "$CHECK_ASYNC_MIN" "$STATF"
-	check_val_min "TRACE_INFO" "$TRACE_INFO_NUM" "$CHECK_TRACE_INFO_MIN" "$STATF"
-	echo ", Ok" >> "$STATF"
-}
-
-perf_dump_aux_tid_verify() {
-	# Specifically crafted test will produce a list of Tread ID's to
-	# stdout that need to be checked to  see that they have had trace
-	# info collected in AUX blocks in the perf data. This will go
-	# through all the TID's that are listed as CID=0xabcdef and see
-	# that all the Thread IDs the test tool reports are  in the perf
-	# data AUX chunks
-
-	# The TID test tools will print a TID per stdout line that are being
-	# tested
-	TIDS=$(cat "$2")
-	# Scan the perf report to find the TIDs that are actually CID in hex
-	# and build a list of the ones found
-	FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
-			grep -o "CID=0x[0-9a-z]\+" | sed 's/CID=//g' | \
-			uniq | sort | uniq)
-	# No CID=xxx found - maybe your kernel is reporting these as
-	# VMID=xxx so look there
-	if test -z "$FOUND_TIDS"; then
-		FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
-				grep -o "VMID=0x[0-9a-z]\+" | sed 's/VMID=//g' | \
-				uniq | sort | uniq)
-	fi
-
-	# Iterate over the list of TIDs that the test says it has and find
-	# them in the TIDs found in the perf report
-	MISSING=""
-	for TID2 in $TIDS; do
-		FOUND=""
-		for TIDHEX in $FOUND_TIDS; do
-			TID=$(printf "%i" $TIDHEX)
-			if test "$TID" -eq "$TID2"; then
-				FOUND="y"
-				break
-			fi
-		done
-		if test -z "$FOUND"; then
-			MISSING="$MISSING $TID"
-		fi
-	done
-	if test -n "$MISSING"; then
-		err "Thread IDs $MISSING not found in perf AUX data"
-	fi
-}

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 16/19] perf test cs-etm: Make disassembly test use kcore
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (14 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 17/19] perf test cs-etm: Add all branch instructions to test James Clark
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

Hits in modules return empty disassembly with vmlinux as an input to
objdump. Make the disassembly test more reliable by always using kcore.
And update the comments to say that this is supported by the script.

Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/scripts/python/arm-cs-trace-disasm.py    | 20 ++++++++++----------
 tools/perf/tests/shell/test_arm_coresight_disasm.sh |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index ba208c90d631..8f6fa4a007b4 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -18,29 +18,29 @@ from perf_trace_context import perf_sample_srccode, perf_config_get
 
 # Below are some example commands for using this script.
 # Note a --kcore recording is required for accurate decode
-# due to the alternatives patching mechanism. However this
-# script only supports reading vmlinux for disassembly dump,
-# meaning that any patched instructions will appear
-# as unpatched, but the instruction ranges themselves will
-# be correct. In addition to this, source line info comes
-# from Perf, and when using kcore there is no debug info. The
-# following lists the supported features in each mode:
+# due to the alternatives patching mechanism. In addition to this,
+# source line info comes from Perf, and when using kcore there is
+# no debug info. The following lists the supported features in each mode:
 #
 # +-----------+-----------------+------------------+------------------+
 # | Recording | Accurate decode | Source line dump | Disassembly dump |
 # +-----------+-----------------+------------------+------------------+
 # | --kcore   | yes             | no               | yes              |
-# | normal    | no              | yes              | yes              |
+# | normal    | no              | yes (inaccurate) | yes (inaccurate) |
 # +-----------+-----------------+------------------+------------------+
 #
 # Output disassembly with objdump and auto detect vmlinux
-# (when running on same machine.)
+# (when running on same machine.):
 #  perf script -s scripts/python/arm-cs-trace-disasm.py -d
 #
 # Output disassembly with llvm-objdump:
 #  perf script -s scripts/python/arm-cs-trace-disasm.py \
 #		-- -d llvm-objdump-11 -k path/to/vmlinux
 #
+# Output accurate disassembly by passing kcore to script:
+#  perf script -s scripts/python/arm-cs-trace-disasm.py \
+#		-- -d -k perf.data/kcore_dir/kcore
+#
 # Output only source line and symbols:
 #  perf script -s scripts/python/arm-cs-trace-disasm.py
 
@@ -57,7 +57,7 @@ def int_arg(v):
 
 args = argparse.ArgumentParser()
 args.add_argument("-k", "--vmlinux",
-		  help="Set path to vmlinux file. Omit to autodetect if running on same machine")
+		  help="Set path to vmlinux or kcore file. Omit to autodetect if running on same machine")
 args.add_argument("-d", "--objdump", nargs="?", const=default_objdump(),
 		  help="Show disassembly. Can also be used to change the objdump path"),
 args.add_argument("-v", "--verbose", action="store_true", help="Enable debugging log")
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 339ae4831868..87797d239f76 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -46,7 +46,7 @@ if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
 	echo "Testing kernel disassembly"
 	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
-		-d --stop-sample=30 2> /dev/null > ${file}
+		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
 	grep -q -e ${branch_search} ${file}
 	echo "Found kernel branches"
 else

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 17/19] perf test cs-etm: Add all branch instructions to test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (15 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 14:40 ` [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test James Clark
  2026-06-09 14:40 ` [PATCH v5 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

If we reduce the number of samples searched to speed up the test, then
there will be less chance of hitting one of these branches. Extend the
regex to cover all branches so the test will always pass.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight_disasm.sh | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 87797d239f76..f78dfb6bf73e 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -38,8 +38,7 @@ cleanup_files()
 trap cleanup_files EXIT TERM INT
 
 # Ranges start and end on branches, so check for some likely branch instructions
-sep="\s\|\s"
-branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s"
+branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al))?|br|blr|ret|cbz|cbnz|tbz|tbnz|svc|eret)([[:space:]]|$)'
 
 ## Test kernel ##
 if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
@@ -47,7 +46,7 @@ if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
 	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
 		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
-	grep -q -e ${branch_search} ${file}
+	grep -q -E ${branch_search} ${file}
 	echo "Found kernel branches"
 else
 	# Root and kcore are required for correct kernel decode due to runtime code patching
@@ -59,7 +58,7 @@ echo "Testing userspace disassembly"
 perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
 perf script -i ${perfdata} -s python:${script_path} -- \
 	-d --stop-sample=30 2> /dev/null > ${file}
-grep -q -e ${branch_search} ${file}
+grep -q -E ${branch_search} ${file}
 echo "Found userspace branches"
 
 glb_err=0

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (16 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 17/19] perf test cs-etm: Add all branch instructions to test James Clark
@ 2026-06-09 14:40 ` James Clark
  2026-06-09 16:48   ` sashiko-bot
  2026-06-09 14:40 ` [PATCH v5 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark
  18 siblings, 1 reply; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

We can use exit snapshot to limit the amount of trace to decode here
too. Also each call to objdump is quite expensive on kcore so limit it
to 2 samples instead of 30. We only want to see if there is no data at
all.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 tools/perf/tests/shell/test_arm_coresight_disasm.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index f78dfb6bf73e..f2fb1aa92252 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al)
 ## Test kernel ##
 if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
 	echo "Testing kernel disassembly"
-	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
+	perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1
 	perf script -i ${perfdata} -s python:${script_path} -- \
-		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
+		-d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
 	grep -q -E ${branch_search} ${file}
 	echo "Found kernel branches"
 else
@@ -55,9 +55,9 @@ fi
 
 ## Test user ##
 echo "Testing userspace disassembly"
-perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
+perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1
 perf script -i ${perfdata} -s python:${script_path} -- \
-	-d --stop-sample=30 2> /dev/null > ${file}
+	-d --stop-sample=2 2> /dev/null > ${file}
 grep -q -E ${branch_search} ${file}
 echo "Found userspace branches"
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 19/19] perf test cs-etm: Move existing tests to coresight folder
  2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
                   ` (17 preceding siblings ...)
  2026-06-09 14:40 ` [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test James Clark
@ 2026-06-09 14:40 ` James Clark
  18 siblings, 0 replies; 27+ messages in thread
From: James Clark @ 2026-06-09 14:40 UTC (permalink / raw)
  To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis
  Cc: coresight, linux-perf-users, linux-kernel,
	Arnaldo Carvalho de Melo, linux-doc, James Clark

There is a subfolder for Coresight tests so might as well keep them all
in here.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
 MAINTAINERS                                                         | 1 -
 tools/perf/tests/shell/{ => coresight}/test_arm_coresight.sh        | 0
 tools/perf/tests/shell/{ => coresight}/test_arm_coresight_disasm.sh | 2 +-
 3 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7efb893edcbb..ff8935b459ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2749,7 +2749,6 @@ F:	tools/perf/arch/arm/util/auxtrace.c
 F:	tools/perf/arch/arm/util/cs-etm.c
 F:	tools/perf/arch/arm/util/cs-etm.h
 F:	tools/perf/arch/arm/util/pmu.c
-F:	tools/perf/tests/shell/*coresight*
 F:	tools/perf/tests/shell/coresight/*
 F:	tools/perf/util/cs-etm-decoder/*
 F:	tools/perf/util/cs-etm.*
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/coresight/test_arm_coresight.sh
similarity index 100%
rename from tools/perf/tests/shell/test_arm_coresight.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight.sh
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
similarity index 96%
rename from tools/perf/tests/shell/test_arm_coresight_disasm.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
index f2fb1aa92252..ccb90dda2475 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
@@ -24,7 +24,7 @@ perfdata_dir=$(mktemp -d /tmp/__perf_test.perf.data.XXXXX)
 perfdata=${perfdata_dir}/perf.data
 file=$(mktemp /tmp/temporary_file.XXXXX)
 # Relative path works whether it's installed or running from repo
-script_path=$(dirname "$0")/../../scripts/python/arm-cs-trace-disasm.py
+script_path=$(dirname "$0")/../../../scripts/python/arm-cs-trace-disasm.py
 
 cleanup_files()
 {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 09/19] perf test: Add named_threads workload
  2026-06-09 14:40 ` [PATCH v5 09/19] perf test: Add named_threads workload James Clark
@ 2026-06-09 16:00   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-09 16:00 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The test workload loop is optimized away by the compiler, contains a C11 data race on a shared variable, and contradicts its commit message.
--

commit dcd726ac63da1e487f029b2a3e5128fec278becd
Author: James Clark <james.clark@linaro.org>

perf test: Add named_threads workload

This commit adds a new workload to perf tests that runs multiple threads,
each executing a unique function that loops and sets its thread name. It
is intended to test processor trace decoding for concurrent threads.

> perf test: Add named_threads workload
> 
> Add a workload that runs X threads that run a unique function named
> "named_threads_thread[x]" which performs a multiplication in a loop for

[Severity: Low]
The commit message mentions performing a multiplication, but the
implementation uses addition. Is this the intended behavior?

> Y loops. Each thread sets its name to "thread[x]".

[ ... ]

> diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
> index 90f2d8aa4941f..75b377934a0e6 100644
> --- a/tools/perf/tests/workloads/Build
> +++ b/tools/perf/tests/workloads/Build
> @@ -2,6 +2,7 @@
>  
>  perf-test-y += noploop.o
>  perf-test-y += thloop.o
> +perf-test-y += named_threads.o

[Severity: Medium]
In tools/perf/tests/workloads/Build, unlike other loop workloads in this
directory, there is no CFLAGS_named_threads.o = -O0 override.

Without this, will the compiler optimize the loop away into a single
arithmetic operation and remove the branch instructions this test relies on?

>  perf-test-y += leafloop.o

[ ... ]

> diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
> new file mode 100644
> index 0000000000000..d051d41a3cfe4
> --- /dev/null
> +++ b/tools/perf/tests/workloads/named_threads.c

[ ... ]

> +#define DEFINE_THREAD(n)						\
> +noinline void *named_threads_thread##n(void *arg __maybe_unused)	\
> +{									\
> +	pthread_setname_np(pthread_self(), "thread" #n);		\
> +	for (int i = 0; i < iterations; i++)				\
> +		named_threads_work += 3;				\

[Severity: Medium]
In DEFINE_THREAD() within tools/perf/tests/workloads/named_threads.c,
named_threads_work is a shared global variable accessed concurrently by
multiple threads.

Without atomic operations or locks, does this introduce a data race?

This might trigger ThreadSanitizer test failures during execution.

> +									\
> +	return NULL;							\
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-james-cs-context-tracking-fix-v5-0-d53a7d096a19@linaro.org?part=9

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 12/19] perf test cs-etm: Skip if not root
  2026-06-09 14:40 ` [PATCH v5 12/19] perf test cs-etm: Skip if not root James Clark
@ 2026-06-09 16:13   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-09 16:13 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The script is vulnerable to a TOCTOU symlink attack that allows a local attacker to overwrite arbitrary system files with root privileges.
--

commit 31448575355dfbf022d3a5103f47303bea93bbf4
Author: James Clark <james.clark@linaro.org>

perf test cs-etm: Skip if not root

Use the common idiom for skipping tests if not running as root, which is
required for these tests.

> diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
> --- a/tools/perf/tests/shell/test_arm_coresight.sh
> +++ b/tools/perf/tests/shell/test_arm_coresight.sh
> @@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
>  
>  skip_if_no_cs_etm_event || exit 2
>  
> +if [ "$(id -u)" != 0 ]; then
> +	# Requires root for -C and system wide tests
> +	echo "[Skip] No root permission"
> +	exit 2
> +fi
> +
>  perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>  file=$(mktemp /tmp/temporary_file.XXXXX)

[Severity: High]
This is a pre-existing issue, but does enforcing root execution expose these
temporary files to a symlink attack?

While the temporary files are securely generated with mktemp in the
world-writable /tmp/ directory, the script breaks these guarantees before
writing to them.

Looking at record_touch_file() in tools/perf/tests/shell/test_arm_coresight.sh:

record_touch_file() {
	echo "Recording trace (only user mode) with path: CPU$2 => $1"
	rm -f $file
	perf record -o ${perfdata} -e cs_etm/@$1/u --per-thread \
		-- taskset -c $2 touch $file > /dev/null 2>&1
}

This explicitly deletes the safe file and executes touch as root.

Additionally, looking at arm_cs_etm_system_wide_test():

arm_cs_etm_system_wide_test() {
	echo "Recording trace with system wide mode"
	perf record -o ${perfdata} -e cs_etm// -a -- ls > /dev/null 2>&1

Successive perf record calls rename the existing ${perfdata} to .old, freeing
the securely generated filename.

In both cases, could a local attacker use inotify to detect the deletion or
rename, and insert a symlink pointing to an arbitrary file (like /etc/shadow)?

Since this patch explicitly enforces the script to run as root, won't
subsequent file operations by perf record (which internally uses open with
O_CREAT | O_RDWR | O_TRUNC without O_NOFOLLOW) follow the symlink and
overwrite system files with root privileges?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-james-cs-context-tracking-fix-v5-0-d53a7d096a19@linaro.org?part=12

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test
  2026-06-09 14:40 ` [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test James Clark
@ 2026-06-09 16:48   ` sashiko-bot
  0 siblings, 0 replies; 27+ messages in thread
From: sashiko-bot @ 2026-06-09 16:48 UTC (permalink / raw)
  To: James Clark; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Reducing `--stop-sample` to 2 causes extreme test flakiness due to sample exhaustion and limited branch coverage.
--

commit ef4a0ff67bea24ddbc6e5c1bb7bfc2c172bd783d
Author: James Clark <james.clark@linaro.org>

perf test cs-etm: Speed up disassembly test

This commit limits the amount of trace to decode by using exit snapshots
and reduces the objdump sample limit from 30 to 2 to speed up the
disassembly test.

> diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> index f78dfb6bf73e3..f2fb1aa92252a 100755
> --- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> +++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
> @@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al)
>  ## Test kernel ##
>  if [ "$(id -u)" == 0 ] && [ -e /proc/kcore ]; then
>  	echo "Testing kernel disassembly"
> -	perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
> +	perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1
>  	perf script -i ${perfdata} -s python:${script_path} -- \
> -		-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
> +		-d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}

[Severity: Medium]
Will this severe reduction in sample limit cause test flakiness?

Because sample index tracking in arm-cs-trace-disasm.py starts at -1,
setting the limit to 2 allows exactly 3 samples to be processed. The
script requires at least two branches events on the same CPU to invoke
objdump, since the first event only initializes the cpu data.

If a trace-on event or any instructions events are synthesized first,
they will consume the limited sample count without ever triggering objdump.
This could result in an empty disassembly output and a false negative
failure from grep.

Additionally, the branch regex lacks 32-bit return instructions like bx
or blx. Previously, larger sample sizes hid this gap. Processing only one
block might guarantee spurious failures on 32-bit systems if the block
ends in an unlisted branch.

>  	grep -q -E ${branch_search} ${file}
>  	echo "Found kernel branches"
>  else

[ ... ]

> @@ -55,9 +55,9 @@ fi
>  
>  ## Test user ##
>  echo "Testing userspace disassembly"
> -perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
> +perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1
>  perf script -i ${perfdata} -s python:${script_path} -- \
> -	-d --stop-sample=30 2> /dev/null > ${file}
> +	-d --stop-sample=2 2> /dev/null > ${file}

[Severity: Medium]
Are userspace disassembly tests similarly affected by this sample
exhaustion?

Like the kernel test above, reducing the limit to 2 might not process
enough events to trigger objdump if trace-on or instruction events are
encountered first.

>  grep -q -E ${branch_search} ${file}
>  echo "Found userspace branches"

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-james-cs-context-tracking-fix-v5-0-d53a7d096a19@linaro.org?part=18

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test
  2026-06-09 14:40 ` [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
@ 2026-06-09 17:04   ` Leo Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Yan @ 2026-06-09 17:04 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc

On Tue, Jun 09, 2026 at 03:40:11PM +0100, James Clark wrote:
> Testing a long sequence without branches seems like it would be better
> as a decoder unit test, and this test doesn't test decoding either, so
> it's not clear what bugs this is trying to catch.
> 
> The new deterministic workload has somewhat long sequences when built
> unoptimized, and we can always increase them later if we want to. But
> now we test that decoding always gives the same result for the same
> sequence of code which we've never had before.
> 
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test
  2026-06-09 14:40 ` [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
@ 2026-06-09 17:16   ` Leo Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Yan @ 2026-06-09 17:16 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc

On Tue, Jun 09, 2026 at 03:40:13PM +0100, James Clark wrote:
> Like asm_pure_loop, this memcpy test only checks that 10 of each of a
> few trace packet types occur after recording a lot of trace, which isn't
> more specific than other existing Coresight tests.
> 
> Assume it was supposed to be a stress test for dumping and replace it
> with one that doesn't require a custom binary and checks for a specific
> amount of raw output. Don't bother checking for packets because the
> other tests that test decoding will catch issues with malformed data.
> 
> This also adds coverage for exit snapshot mode which was missing.
> 
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test
  2026-06-09 14:40 ` [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
@ 2026-06-09 17:18   ` Leo Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Yan @ 2026-06-09 17:18 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc

On Tue, Jun 09, 2026 at 03:40:15PM +0100, James Clark wrote:
> The thread_loop test only looks for context IDs in the raw trace.
> There's a lot more that can go wrong when decoding these, so replace it
> with a test that looks at the final output for matching thread names and
> symbols.
> 
> In the future we might use timestamps and context switch events to track
> threads, so looking at context IDs in the raw trace wouldn't always
> work.
> 
> Reviewed-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads
  2026-06-09 14:40 ` [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
@ 2026-06-09 17:22   ` Leo Yan
  0 siblings, 0 replies; 27+ messages in thread
From: Leo Yan @ 2026-06-09 17:22 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, Mike Leach, Arnaldo Carvalho de Melo,
	Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
	Shuah Khan, Paschalis Mpeis, coresight, linux-perf-users,
	linux-kernel, Arnaldo Carvalho de Melo, linux-doc

On Tue, Jun 09, 2026 at 03:40:20PM +0100, James Clark wrote:
> These are now unused and had various issues like not working with out of
> source builds and being slow to compile. Delete them.

> +There are a set of Perf tests for CoreSight which can be run with::
> +
> +  sudo perf test coresight

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-06-09 17:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 14:40 [PATCH v5 00/19] perf cs-etm: Queue context packets for frontend James Clark
2026-06-09 14:40 ` [PATCH v5 01/19] " James Clark
2026-06-09 14:40 ` [PATCH v5 02/19] perf test: Add workload-ctl option James Clark
2026-06-09 14:40 ` [PATCH v5 03/19] perf test: Add a workload that forces context switches James Clark
2026-06-09 14:40 ` [PATCH v5 04/19] perf test cs-etm: Test process attribution James Clark
2026-06-09 14:40 ` [PATCH v5 05/19] perf test: Add deterministic workload James Clark
2026-06-09 14:40 ` [PATCH v5 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
2026-06-09 17:04   ` Leo Yan
2026-06-09 14:40 ` [PATCH v5 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
2026-06-09 14:40 ` [PATCH v5 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
2026-06-09 17:16   ` Leo Yan
2026-06-09 14:40 ` [PATCH v5 09/19] perf test: Add named_threads workload James Clark
2026-06-09 16:00   ` sashiko-bot
2026-06-09 14:40 ` [PATCH v5 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
2026-06-09 17:18   ` Leo Yan
2026-06-09 14:40 ` [PATCH v5 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
2026-06-09 14:40 ` [PATCH v5 12/19] perf test cs-etm: Skip if not root James Clark
2026-06-09 16:13   ` sashiko-bot
2026-06-09 14:40 ` [PATCH v5 13/19] perf test cs-etm: Reduce snapshot size James Clark
2026-06-09 14:40 ` [PATCH v5 14/19] perf test cs-etm: Speed up basic test James Clark
2026-06-09 14:40 ` [PATCH v5 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
2026-06-09 17:22   ` Leo Yan
2026-06-09 14:40 ` [PATCH v5 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
2026-06-09 14:40 ` [PATCH v5 17/19] perf test cs-etm: Add all branch instructions to test James Clark
2026-06-09 14:40 ` [PATCH v5 18/19] perf test cs-etm: Speed up disassembly test James Clark
2026-06-09 16:48   ` sashiko-bot
2026-06-09 14:40 ` [PATCH v5 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox