* [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend
@ 2026-06-03 10:17 James Clark
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
` (18 more replies)
0 siblings, 19 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Fix thread tracking when decoding Coresight trace and add a new test for
it.
The new test is added as a Perf test workload instead of a custom binary
with its own build system, but this requires a new feature in Perf test
to pass in control pipes which can enable and disable events. This
scopes the recording to just the workload and helps to reduce the amount
of data recorded in tracing tests.
With this new feature we can re-write all of the Coresight tests to make
use of it and remove the remaining binaries which fixes the following
issues:
* They didn't work in out of source builds
* A lot of the tests unnecessarily required root and didn't skip
without it
* They were mainly qualitative tests which didn't look for specific
behavior
Most importantly, the long build and runtime has been reduced. On a
Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
than the entire Perf build. Now the build time is negligible and the
before and after test runtimes for all the Coresight tests are:
| N1SDP | Orion O6
-----------------------------------
Before | 4m 0s | 14m 49s
After | 26s | 56s
-----------------------------------
Signed-off-by: James Clark <james.clark@linaro.org>
---
Changes in v3:
- Minor sashiko comments
- Close some more pipes
- Fix warning messages
- Error handling improvements
- Pass packet into cs_etm__synth_instruction_sample()
- Fixup stale comment (Leo)
- Link to v2: https://lore.kernel.org/r/20260602-james-cs-context-tracking-fix-v2-0-85b5ce6f55c6@linaro.org
Changes in v2:
- Add --workload-ctl option to Perf test
- Re-write all the Coresight tests and speed them up
- Pass packet to memory access function so frontend can use either the
previous or current packet's EL
- Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd602e18287@linaro.org
---
James Clark (19):
perf cs-etm: Queue context packets for frontend
perf test: Add workload-ctl option
perf test: Add a workload that forces context switches
perf test cs-etm: Test process attribution
perf test: Add deterministic workload
perf test cs-etm: Replace unroll loop thread with deterministic decode test
perf test cs-etm: Remove asm_pure_loop test
perf test cs-etm: Replace memcpy test with raw dump stress test
perf test: Add named_threads workload
perf test cs-etm: Test decoding for concurrent threads test
perf test cs-etm: Remove duplicate branch tests
perf test cs-etm: Skip if not root
perf test cs-etm: Reduce snapshot size
perf test cs-etm: Speed up basic test
perf test cs-etm: Remove unused Coresight workloads
perf test cs-etm: Make disassembly test use kcore
perf test cs-etm: Add all branch instructions to test
perf test cs-etm: Speed up disassembly test
perf test cs-etm: Move existing tests to coresight folder
Documentation/trace/coresight/coresight-perf.rst | 78 +------
MAINTAINERS | 2 -
tools/perf/Documentation/perf-test.txt | 18 +-
tools/perf/Makefile.perf | 14 +-
tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 +-
tools/perf/tests/builtin-test.c | 187 +++++++++++++++-
tools/perf/tests/shell/coresight/Makefile | 29 ---
.../perf/tests/shell/coresight/Makefile.miniconfig | 14 --
tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 --
.../tests/shell/coresight/asm_pure_loop/.gitignore | 1 -
.../tests/shell/coresight/asm_pure_loop/Makefile | 34 ---
.../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 ---
.../tests/shell/coresight/concurrent_threads.sh | 45 ++++
.../tests/shell/coresight/context_switch_thread.sh | 69 ++++++
tools/perf/tests/shell/coresight/deterministic.sh | 71 +++++++
.../tests/shell/coresight/memcpy_thread/.gitignore | 1 -
.../tests/shell/coresight/memcpy_thread/Makefile | 33 ---
.../shell/coresight/memcpy_thread/memcpy_thread.c | 80 -------
.../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 --
.../perf/tests/shell/coresight/raw_dump_stress.sh | 48 +++++
.../shell/{ => coresight}/test_arm_coresight.sh | 43 ++--
.../{ => coresight}/test_arm_coresight_disasm.sh | 17 +-
.../tests/shell/coresight/thread_loop/.gitignore | 1 -
.../tests/shell/coresight/thread_loop/Makefile | 33 ---
.../shell/coresight/thread_loop/thread_loop.c | 85 --------
.../shell/coresight/thread_loop_check_tid_10.sh | 23 --
.../shell/coresight/thread_loop_check_tid_2.sh | 23 --
.../shell/coresight/unroll_loop_thread/.gitignore | 1 -
.../shell/coresight/unroll_loop_thread/Makefile | 33 ---
.../unroll_loop_thread/unroll_loop_thread.c | 75 -------
.../tests/shell/coresight/unroll_loop_thread_10.sh | 22 --
tools/perf/tests/shell/lib/coresight.sh | 134 ------------
tools/perf/tests/tests.h | 3 +
tools/perf/tests/workloads/Build | 4 +
tools/perf/tests/workloads/context_switch_loop.c | 101 +++++++++
tools/perf/tests/workloads/deterministic.c | 39 ++++
tools/perf/tests/workloads/named_threads.c | 109 ++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 +-
tools/perf/util/cs-etm.c | 234 ++++++++++++---------
tools/perf/util/cs-etm.h | 8 +-
40 files changed, 889 insertions(+), 938 deletions(-)
---
base-commit: 5f0ca6b80b12bab1ce06839cdffb6148bb650ff4
change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed
Best regards,
--
James Clark <james.clark@linaro.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v3 01/19] perf cs-etm: Queue context packets for frontend
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:32 ` sashiko-bot
2026-06-03 20:14 ` Amir Ayupov
2026-06-03 10:17 ` [PATCH v3 02/19] perf test: Add workload-ctl option James Clark
` (17 subsequent siblings)
18 siblings, 2 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
PE_CONTEXT elements update the context ID and exception level, but the
decoder may still have prior packets cached for frontend processing.
Updating the context immediately in the decoder backend can make those
cached packets get consumed with the wrong thread or EL state.
Add a CS_ETM_CONTEXT packet carrying the TID and EL to the frontend,
this keeps context changes ordered with the rest of the packet stream
and avoids mismatches when synthesizing samples from cached packets.
Separate the memory access function into one for the frontend and one
for decoding. The frontend also needs memory access to attach the
instruction to samples. Because the frontend does memory access for
both previous and current packets, change all the frontend memory access
function signatures to take both a tidq and packet. But backend always
uses the current backend EL and thread from the tidq.
Treat context packets as a boundary for branch sample generation and
remove tidq->prev_packet_thread because it's not possible to branch to a
different thread, so only tracking the current thread is required for
sample generation.
Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
Reported-by: Amir Ayupov <aaupov@meta.com>
Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/
Co-authored-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 ++-
tools/perf/util/cs-etm.c | 234 ++++++++++++++----------
tools/perf/util/cs-etm.h | 8 +-
3 files changed, 162 insertions(+), 101 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index dee3020ceaa9..26940f1f1b0b 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -402,6 +402,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
packet_queue->packet_buffer[et].flags = 0;
packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
+ packet_queue->packet_buffer[et].el = ocsd_EL_unknown;
+ packet_queue->packet_buffer[et].tid = -1;
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
@@ -449,6 +451,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
packet->last_instr_type = elem->last_i_type;
packet->last_instr_subtype = elem->last_i_subtype;
packet->last_instr_cond = elem->last_instr_cond;
+ packet->el = elem->context.exception_level;
if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT)
packet->last_instr_taken_branch = elem->last_instr_exec;
@@ -525,7 +528,9 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
{
+ struct cs_etm_packet *packet;
pid_t tid = -1;
+ int ret;
/*
* Process the PE_CONTEXT packets if we have a valid contextID or VMID.
@@ -546,12 +551,18 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
break;
}
- if (cs_etm__etmq_set_tid_el(etmq, tid, trace_chan_id,
- elem->context.exception_level))
+ if (cs_etm__etmq_update_decode_context(etmq, trace_chan_id,
+ elem->context.exception_level, tid))
return OCSD_RESP_FATAL_SYS_ERR;
- if (tid == -1)
- return OCSD_RESP_CONT;
+ ret = cs_etm_decoder__buffer_packet(etmq, packet_queue, trace_chan_id,
+ CS_ETM_CONTEXT);
+ if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+ return ret;
+
+ packet = &packet_queue->packet_buffer[packet_queue->tail];
+ packet->tid = tid;
+ packet->el = elem->context.exception_level;
/*
* A timestamp is generated after a PE_CONTEXT element so make sure
@@ -559,7 +570,7 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
*/
cs_etm_decoder__reset_timestamp(packet_queue);
- return OCSD_RESP_CONT;
+ return ret;
}
static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 40c6ddfa8c8d..ce570913669c 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -85,15 +85,22 @@ struct cs_etm_traceid_queue {
u64 period_instructions;
size_t last_branch_pos;
union perf_event *event_buf;
- struct thread *thread;
- struct thread *prev_packet_thread;
- ocsd_ex_level prev_packet_el;
- ocsd_ex_level el;
struct branch_stack *last_branch;
struct branch_stack *last_branch_rb;
struct cs_etm_packet *prev_packet;
struct cs_etm_packet *packet;
struct cs_etm_packet_queue packet_queue;
+
+ struct thread *decode_thread;
+ ocsd_ex_level decode_el;
+
+ /*
+ * The frontend accesses the EL from '[prev_]packet' because it needs
+ * previous EL for branch and current EL for instruction samples. It's
+ * not possible to change thread in a single branch sample so no need to
+ * store or access the thread through the packet.
+ */
+ struct thread *frontend_thread;
};
enum cs_etm_format {
@@ -614,10 +621,11 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
tidq->trace_chan_id = trace_chan_id;
- tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
- tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
+ tidq->decode_el = ocsd_EL_unknown;
+ tidq->frontend_thread = machine__findnew_thread(&etm->session->machines.host, -1,
+ queue->tid);
+ tidq->decode_thread = machine__findnew_thread(&etm->session->machines.host, -1,
queue->tid);
- tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
tidq->packet = zalloc(sizeof(struct cs_etm_packet));
if (!tidq->packet)
@@ -750,21 +758,10 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
/*
* Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
* the next incoming packet.
- *
- * Threads and exception levels are also tracked for both the
- * previous and current packets. This is because the previous
- * packet is used for the 'from' IP for branch samples, so the
- * thread at that time must also be assigned to that sample.
- * Across discontinuity packets the thread can change, so by
- * tracking the thread for the previous packet the branch sample
- * will have the correct info.
*/
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
tidq->prev_packet = tmp;
- tidq->prev_packet_el = tidq->el;
- thread__put(tidq->prev_packet_thread);
- tidq->prev_packet_thread = thread__get(tidq->thread);
}
}
@@ -937,8 +934,8 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
/* Free this traceid_queue from the array */
tidq = etmq->traceid_queues[idx];
- thread__zput(tidq->thread);
- thread__zput(tidq->prev_packet_thread);
+ thread__zput(tidq->frontend_thread);
+ thread__zput(tidq->decode_thread);
zfree(&tidq->event_buf);
zfree(&tidq->last_branch);
zfree(&tidq->last_branch_rb);
@@ -1083,47 +1080,43 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address,
}
}
-static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
- u64 address, size_t size, u8 *buffer,
- const ocsd_mem_space_acc_t mem_space)
+static u32 __cs_etm__mem_access(struct cs_etm_queue *etmq,
+ u64 address, size_t size, u8 *buffer,
+ const ocsd_mem_space_acc_t mem_space,
+ ocsd_ex_level el, struct thread *thread)
{
u8 cpumode;
u64 offset;
int len;
struct addr_location al;
struct dso *dso;
- struct cs_etm_traceid_queue *tidq;
int ret = 0;
if (!etmq)
return 0;
addr_location__init(&al);
- tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
- if (!tidq)
- goto out;
/*
- * We've already tracked EL along side the PID in cs_etm__set_thread()
- * so double check that it matches what OpenCSD thinks as well. It
- * doesn't distinguish between EL0 and EL1 for this mem access callback
- * so we had to do the extra tracking. Skip validation if it's any of
- * the 'any' values.
+ * We track EL for the frontend and the backend when receiving context
+ * and range packets. OpenCSD doesn't distinguish between EL0 and EL1
+ * for this mem access callback so we had to do the extra tracking. Skip
+ * validation if it's any of the 'any' values.
*/
if (!(mem_space == OCSD_MEM_SPACE_ANY ||
mem_space == OCSD_MEM_SPACE_N || mem_space == OCSD_MEM_SPACE_S)) {
if (mem_space & OCSD_MEM_SPACE_EL1N) {
/* Includes both non secure EL1 and EL0 */
- assert(tidq->el == ocsd_EL1 || tidq->el == ocsd_EL0);
+ assert(el == ocsd_EL1 || el == ocsd_EL0);
} else if (mem_space & OCSD_MEM_SPACE_EL2)
- assert(tidq->el == ocsd_EL2);
+ assert(el == ocsd_EL2);
else if (mem_space & OCSD_MEM_SPACE_EL3)
- assert(tidq->el == ocsd_EL3);
+ assert(el == ocsd_EL3);
}
- cpumode = cs_etm__cpu_mode(etmq, address, tidq->el);
+ cpumode = cs_etm__cpu_mode(etmq, address, el);
- if (!thread__find_map(tidq->thread, cpumode, address, &al))
+ if (!thread__find_map(thread, cpumode, address, &al))
goto out;
dso = map__dso(al.map);
@@ -1138,7 +1131,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
map__load(al.map);
- len = dso__data_read_offset(dso, maps__machine(thread__maps(tidq->thread)),
+ len = dso__data_read_offset(dso, maps__machine(thread__maps(thread)),
offset, buffer, size);
if (len <= 0) {
@@ -1158,6 +1151,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
return ret;
}
+static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq,
+ struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet,
+ u64 address, size_t size, u8 *buffer)
+{
+ return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el,
+ tidq->frontend_thread);
+}
+
+static u32 cs_etm__decoder_mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
+ u64 address, size_t size, u8 *buffer,
+ const ocsd_mem_space_acc_t mem_space)
+{
+ struct cs_etm_traceid_queue *tidq;
+
+ tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
+ if (!tidq)
+ return 0;
+
+ return __cs_etm__mem_access(etmq, address, size, buffer,
+ mem_space, tidq->decode_el,
+ tidq->decode_thread);
+}
+
static struct cs_etm_queue *cs_etm__alloc_queue(void)
{
struct cs_etm_queue *etmq = zalloc(sizeof(*etmq));
@@ -1333,12 +1350,13 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq)
}
static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
- u8 trace_chan_id, u64 addr)
+ struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet, u64 addr)
{
u8 instrBytes[2];
- cs_etm__mem_access(etmq, trace_chan_id, addr, ARRAY_SIZE(instrBytes),
- instrBytes, 0);
+ cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+ ARRAY_SIZE(instrBytes), instrBytes);
/*
* T32 instruction size is indicated by bits[15:11] of the first
* 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
@@ -1371,16 +1389,16 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
}
static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
- u64 trace_chan_id,
- const struct cs_etm_packet *packet,
+ struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet,
u64 offset)
{
if (packet->isa == CS_ETM_ISA_T32) {
u64 addr = packet->start_addr;
while (offset) {
- addr += cs_etm__t32_instr_size(etmq,
- trace_chan_id, addr);
+ addr += cs_etm__t32_instr_size(etmq, tidq, packet,
+ addr);
offset--;
}
return addr;
@@ -1490,34 +1508,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
return etmq->buf_len;
}
-static void cs_etm__set_thread(struct cs_etm_queue *etmq,
- struct cs_etm_traceid_queue *tidq, pid_t tid,
- ocsd_ex_level el)
+/*
+ * Convert a raw thread number to a thread struct and assign it to **thread.
+ */
+static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq,
+ ocsd_ex_level el, pid_t tid,
+ struct thread **thread)
{
struct machine *machine = cs_etm__get_machine(etmq, el);
+ if (!machine || !*thread)
+ return -EINVAL;
+
if (tid != -1) {
- thread__zput(tidq->thread);
- tidq->thread = machine__find_thread(machine, -1, tid);
+ thread__zput(*thread);
+ *thread = machine__find_thread(machine, -1, tid);
}
/* Couldn't find a known thread */
- if (!tidq->thread)
- tidq->thread = machine__idle_thread(machine);
+ if (!*thread)
+ *thread = machine__idle_thread(machine);
- tidq->el = el;
+ return 0;
}
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
- u8 trace_chan_id, ocsd_ex_level el)
+/*
+ * Set the thread and EL of the decode context which is ahead in time of the
+ * frontend context.
+ */
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+ u8 trace_chan_id,
+ ocsd_ex_level el, pid_t tid)
{
struct cs_etm_traceid_queue *tidq;
+ int ret;
tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
if (!tidq)
return -EINVAL;
- cs_etm__set_thread(etmq, tidq, tid, el);
+ ret = cs_etm__etmq_update_thread(etmq, el, tid,
+ &tidq->decode_thread);
+ if (ret)
+ return ret;
+
+ tidq->decode_el = el;
return 0;
}
@@ -1527,8 +1562,8 @@ bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq)
}
static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
- u64 trace_chan_id,
- const struct cs_etm_packet *packet,
+ struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet,
struct perf_sample *sample)
{
/*
@@ -1545,14 +1580,14 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
* cs_etm__t32_instr_size().
*/
if (packet->isa == CS_ETM_ISA_T32)
- sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
+ sample->insn_len = cs_etm__t32_instr_size(etmq, tidq, packet,
sample->ip);
/* Otherwise, A64 and A32 instruction size are always 32-bit. */
else
sample->insn_len = 4;
- cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len,
- (void *)sample->insn, 0);
+ cs_etm__frontend_mem_access(etmq, tidq, packet, sample->ip,
+ sample->insn_len, (void *)sample->insn);
}
u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp)
@@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet,
u64 addr, u64 period)
{
int ret = 0;
@@ -1588,15 +1624,15 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
perf_sample__init(&sample, /*all=*/true);
event->sample.header.type = PERF_RECORD_SAMPLE;
- event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
+ event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, packet->el);
event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */
sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = addr;
- sample.pid = thread__pid(tidq->thread);
- sample.tid = thread__tid(tidq->thread);
+ sample.pid = thread__pid(tidq->frontend_thread);
+ sample.tid = thread__tid(tidq->frontend_thread);
sample.id = etmq->etm->instructions_id;
sample.stream_id = etmq->etm->instructions_id;
sample.period = period;
@@ -1604,7 +1640,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags;
sample.cpumode = event->sample.header.misc;
- cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+ cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
if (etm->synth_opts.last_branch)
sample.branch_stack = tidq->last_branch;
@@ -1649,15 +1685,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
event->sample.header.type = PERF_RECORD_SAMPLE;
event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
- tidq->prev_packet_el);
+ tidq->prev_packet->el);
event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */
sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = ip;
- sample.pid = thread__pid(tidq->prev_packet_thread);
- sample.tid = thread__tid(tidq->prev_packet_thread);
+ sample.pid = thread__pid(tidq->frontend_thread);
+ sample.tid = thread__tid(tidq->frontend_thread);
sample.addr = cs_etm__first_executed_instr(tidq->packet);
sample.id = etmq->etm->branches_id;
sample.stream_id = etmq->etm->branches_id;
@@ -1666,8 +1702,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags;
sample.cpumode = event->sample.header.misc;
- cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet,
- &sample);
+ cs_etm__copy_insn(etmq, tidq, tidq->prev_packet, &sample);
/*
* perf report cannot handle events without a branch stack
@@ -1788,7 +1823,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
{
struct cs_etm_auxtrace *etm = etmq->etm;
int ret;
- u8 trace_chan_id = tidq->trace_chan_id;
u64 instrs_prev;
/* Get instructions remainder from previous packet */
@@ -1874,10 +1908,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
* been executed, but PC has not advanced to next
* instruction)
*/
- addr = cs_etm__instr_addr(etmq, trace_chan_id,
- tidq->packet, offset - 1);
+ addr = cs_etm__instr_addr(etmq, tidq, tidq->packet,
+ offset - 1);
ret = cs_etm__synth_instruction_sample(
- etmq, tidq, addr,
+ etmq, tidq, tidq->packet, addr,
etm->instructions_sample_period);
if (ret)
return ret;
@@ -1959,7 +1993,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
- etmq, tidq, addr,
+ etmq, tidq, tidq->prev_packet, addr,
tidq->period_instructions);
if (err)
return err;
@@ -2014,7 +2048,7 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
addr = cs_etm__last_executed_instr(tidq->prev_packet);
err = cs_etm__synth_instruction_sample(
- etmq, tidq, addr,
+ etmq, tidq, tidq->prev_packet, addr,
tidq->period_instructions);
if (err)
return err;
@@ -2051,9 +2085,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
return etmq->buf_len;
}
-static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
- struct cs_etm_packet *packet,
- u64 end_addr)
+static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq,
+ struct cs_etm_traceid_queue *tidq,
+ struct cs_etm_packet *packet, u64 end_addr)
{
/* Initialise to keep compiler happy */
u16 instr16 = 0;
@@ -2075,8 +2109,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
* so below only read 2 bytes as instruction size for T32.
*/
addr = end_addr - 2;
- cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr16),
- (u8 *)&instr16, 0);
+ cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+ sizeof(instr16), (u8 *)&instr16);
if ((instr16 & 0xFF00) == 0xDF00)
return true;
@@ -2091,8 +2125,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
* +---------+---------+-------------------------+
*/
addr = end_addr - 4;
- cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
- (u8 *)&instr32, 0);
+ cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+ sizeof(instr32), (u8 *)&instr32);
if ((instr32 & 0x0F000000) == 0x0F000000 &&
(instr32 & 0xF0000000) != 0xF0000000)
return true;
@@ -2108,8 +2142,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
* +-----------------------+---------+-----------+
*/
addr = end_addr - 4;
- cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
- (u8 *)&instr32, 0);
+ cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
+ sizeof(instr32), (u8 *)&instr32);
if ((instr32 & 0xFFE0001F) == 0xd4000001)
return true;
@@ -2125,7 +2159,6 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq, u64 magic)
{
- u8 trace_chan_id = tidq->trace_chan_id;
struct cs_etm_packet *packet = tidq->packet;
struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -2140,7 +2173,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
*/
if (magic == __perf_cs_etmv4_magic) {
if (packet->exception_number == CS_ETMV4_EXC_CALL &&
- cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+ cs_etm__is_svc_instr(etmq, tidq, prev_packet,
prev_packet->end_addr))
return true;
}
@@ -2178,7 +2211,6 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
struct cs_etm_traceid_queue *tidq,
u64 magic)
{
- u8 trace_chan_id = tidq->trace_chan_id;
struct cs_etm_packet *packet = tidq->packet;
struct cs_etm_packet *prev_packet = tidq->prev_packet;
@@ -2204,7 +2236,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
* (SMC, HVC) are taken as sync exceptions.
*/
if (packet->exception_number == CS_ETMV4_EXC_CALL &&
- !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
+ !cs_etm__is_svc_instr(etmq, tidq, prev_packet,
prev_packet->end_addr))
return true;
@@ -2228,7 +2260,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
{
struct cs_etm_packet *packet = tidq->packet;
struct cs_etm_packet *prev_packet = tidq->prev_packet;
- u8 trace_chan_id = tidq->trace_chan_id;
u64 magic;
int ret;
@@ -2309,11 +2340,11 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
if (prev_packet->flags == (PERF_IP_FLAG_BRANCH |
PERF_IP_FLAG_RETURN |
PERF_IP_FLAG_INTERRUPT) &&
- cs_etm__is_svc_instr(etmq, trace_chan_id,
- packet, packet->start_addr))
+ cs_etm__is_svc_instr(etmq, tidq, packet, packet->start_addr)) {
prev_packet->flags = PERF_IP_FLAG_BRANCH |
PERF_IP_FLAG_RETURN |
PERF_IP_FLAG_SYSCALLRET;
+ }
break;
case CS_ETM_DISCONTINUITY:
/*
@@ -2394,6 +2425,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
PERF_IP_FLAG_RETURN |
PERF_IP_FLAG_INTERRUPT;
break;
+ case CS_ETM_CONTEXT:
case CS_ETM_EMPTY:
default:
break;
@@ -2469,6 +2501,19 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
*/
cs_etm__sample(etmq, tidq);
break;
+ case CS_ETM_CONTEXT:
+ /*
+ * Update context but don't swap packet. Keep the
+ * previous one for branch source address info, if
+ * tracing the kernel the context packet will be emitted
+ * between two ranges.
+ */
+ ret = cs_etm__etmq_update_thread(etmq, tidq->packet->el,
+ tidq->packet->tid,
+ &tidq->frontend_thread);
+ if (ret)
+ goto out;
+ break;
case CS_ETM_EXCEPTION:
case CS_ETM_EXCEPTION_RET:
/*
@@ -2497,6 +2542,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
}
}
+out:
return ret;
}
@@ -2620,7 +2666,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
if (!tidq)
continue;
- if (tid == -1 || thread__tid(tidq->thread) == tid)
+ if (tid == -1 || thread__tid(tidq->frontend_thread) == tid)
cs_etm__run_per_thread_timeless_decoder(etmq);
} else
cs_etm__run_per_cpu_timeless_decoder(etmq);
@@ -3328,7 +3374,7 @@ static int cs_etm__create_queue_decoders(struct cs_etm_queue *etmq)
*/
if (cs_etm_decoder__add_mem_access_cb(etmq->decoder,
0x0L, ((u64) -1L),
- cs_etm__mem_access))
+ cs_etm__decoder_mem_access))
goto out_free_decoder;
zfree(&t_params);
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index aa9bb4a32eca..b81099c2b301 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -158,6 +158,7 @@ enum cs_etm_sample_type {
CS_ETM_DISCONTINUITY,
CS_ETM_EXCEPTION,
CS_ETM_EXCEPTION_RET,
+ CS_ETM_CONTEXT,
};
enum cs_etm_isa {
@@ -184,6 +185,8 @@ struct cs_etm_packet {
u8 last_instr_size;
u8 trace_chan_id;
int cpu;
+ int el;
+ pid_t tid;
};
#define CS_ETM_PACKET_MAX_BUFFER 1024
@@ -259,8 +262,9 @@ enum cs_etm_pid_fmt {
#include <opencsd/ocsd_if_types.h>
int cs_etm__get_cpu(struct cs_etm_queue *etmq, u8 trace_chan_id, int *cpu);
enum cs_etm_pid_fmt cs_etm__get_pid_fmt(struct cs_etm_queue *etmq);
-int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
- u8 trace_chan_id, ocsd_ex_level el);
+int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
+ u8 trace_chan_id, ocsd_ex_level el,
+ pid_t tid);
bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq);
void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 02/19] perf test: Add workload-ctl option
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 03/19] perf test: Add a workload that forces context switches James Clark
` (16 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Add a --workload-ctl=fifo:ctl-fifo[,ack-fifo] option for 'perf test
-w'. When set, run_workload() opens the named FIFO, writes enable before
invoking the builtin workload, writes disable before returning, and
waits for ack responses when an ack FIFO is provided to ensure that the
workload doesn't run until the events are enabled.
This can be used to limit the scope of the recording to only the
workload execution and avoid recording Perf setup and teardown code if
Perf record is started with events disabled (-D 1).
Assisted-by: Codex:GPT-5.5
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-test.txt | 6 ++
tools/perf/tests/builtin-test.c | 184 ++++++++++++++++++++++++++++++++-
2 files changed, 188 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 32da0d1fa86a..1faf30d4a7be 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -69,3 +69,9 @@ OPTIONS
--list-workloads::
List the available workloads to use with -w/--workload.
+
+--workload-ctl=fifo:ctl-fifo[,ack-fifo]::
+ Write 'enable' to ctl-fifo before running the workload and 'disable'
+ before returning. If ack-fifo is provided, the workload runner waits for
+ an 'ack' response after each command. This scopes the recording to only
+ the workload if used with 'perf record -D 1 --control ...'.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index f2c135891477..a9e67d7da700 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -50,6 +50,7 @@ static bool sequential;
static unsigned int runs_per_test = 1;
const char *dso_to_test;
const char *test_objdump_path = "objdump";
+static const char *workload_control;
/*
* List of architecture specific tests. Not a weak symbol as the array length is
@@ -161,6 +162,11 @@ static struct test_workload *workloads[] = {
#endif
};
+struct workload_control {
+ int ctl_fd;
+ int ack_fd;
+};
+
#define workloads__for_each(workload) \
for (unsigned i = 0; i < ARRAY_SIZE(workloads) && ({ workload = workloads[i]; 1; }); i++)
@@ -711,13 +717,185 @@ static int workloads__fprintf_list(FILE *fp)
return printed;
}
+static int perf_control_open_fifo(struct workload_control *ctl, const char *str)
+{
+ char *s, *p;
+ int ret;
+
+ if (strncmp(str, "fifo:", 5))
+ return -EINVAL;
+
+ str += 5;
+ if (!*str || *str == ',')
+ return -EINVAL;
+
+ s = strdup(str);
+ if (!s)
+ return -ENOMEM;
+
+ p = strchr(s, ',');
+ if (p)
+ *p = '\0';
+
+ ctl->ctl_fd = open(s, O_WRONLY | O_CLOEXEC);
+ if (ctl->ctl_fd < 0) {
+ ret = -errno;
+ pr_err("Failed to open workload control FIFO '%s': %m\n", s);
+ free(s);
+ return ret;
+ }
+
+ if (p && *++p) {
+ ctl->ack_fd = open(p, O_RDONLY | O_CLOEXEC);
+ if (ctl->ack_fd < 0) {
+ ret = -errno;
+ pr_err("Failed to open workload control ack FIFO '%s': %m\n", p);
+ close(ctl->ctl_fd);
+ ctl->ctl_fd = -1;
+ free(s);
+ return ret;
+ }
+ }
+
+ free(s);
+ return 0;
+}
+
+static int perf_control_open(struct workload_control *ctl)
+{
+ int ret;
+
+ if (!workload_control)
+ return 0;
+
+ ret = perf_control_open_fifo(ctl, workload_control);
+
+ if (ret == -EINVAL) {
+ pr_err("Unsupported workload control spec '%s', expected fifo:ctl-fifo[,ack-fifo]\n",
+ workload_control);
+ }
+
+ return ret;
+}
+
+static void perf_control_close(struct workload_control *ctl)
+{
+ if (ctl->ctl_fd >= 0) {
+ close(ctl->ctl_fd);
+ ctl->ctl_fd = -1;
+ }
+ if (ctl->ack_fd >= 0) {
+ close(ctl->ack_fd);
+ ctl->ack_fd = -1;
+ }
+}
+
+static int perf_control_write_cmd(int fd, const char *cmd)
+{
+ size_t len = strlen(cmd);
+ ssize_t ret;
+
+ while (len) {
+ ret = write(fd, cmd, len);
+ if (ret < 0) {
+ if (errno == EINTR)
+ continue;
+ pr_err("Failed to write perf control command: %m\n");
+ return -1;
+ }
+
+ if (!ret) {
+ pr_err("Failed to write perf control command: short write\n");
+ return -1;
+ }
+
+ cmd += ret;
+ len -= ret;
+ }
+
+ return 0;
+}
+
+static int perf_control_read_ack(int fd)
+{
+ char buf[16];
+ ssize_t ret;
+
+ do {
+ ret = read(fd, buf, sizeof(buf) - 1);
+ } while (ret < 0 && errno == EINTR);
+
+ if (ret < 0) {
+ pr_err("Failed to read perf control ack: %m\n");
+ return -1;
+ }
+
+ if (!ret) {
+ pr_err("Unexpected EOF while reading perf control ack\n");
+ return -1;
+ }
+
+ buf[ret] = '\0';
+ for (ssize_t i = 0; i < ret; i++) {
+ if (buf[i] == '\n' || buf[i] == '\0') {
+ buf[i] = '\0';
+ break;
+ }
+ }
+
+ if (strcmp(buf, "ack")) {
+ pr_err("Unexpected perf control ack: %s\n", buf);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int perf_control_send(struct workload_control *ctl, const char *cmd)
+{
+ if (ctl->ctl_fd < 0)
+ return 0;
+
+ if (perf_control_write_cmd(ctl->ctl_fd, cmd))
+ return -1;
+
+ if (ctl->ack_fd >= 0 && perf_control_read_ack(ctl->ack_fd))
+ return -1;
+
+ return 0;
+}
+
static int run_workload(const char *work, int argc, const char **argv)
{
struct test_workload *twl;
workloads__for_each(twl) {
- if (!strcmp(twl->name, work))
- return twl->func(argc, argv);
+ struct workload_control ctl = {
+ .ctl_fd = -1,
+ .ack_fd = -1,
+ };
+ int control_ret, ret;
+
+ if (strcmp(twl->name, work))
+ continue;
+
+ ret = perf_control_open(&ctl);
+ if (ret)
+ return ret;
+
+ if (perf_control_send(&ctl, "enable\n")) {
+ perf_control_close(&ctl);
+ return -1;
+ }
+
+ ret = twl->func(argc, argv);
+
+ control_ret = perf_control_send(&ctl, "disable\n");
+ perf_control_close(&ctl);
+ if (control_ret)
+ return -1;
+
+ return ret;
}
pr_info("No workload found: %s\n", work);
@@ -799,6 +977,8 @@ int cmd_test(int argc, const char **argv)
OPT_UINTEGER('r', "runs-per-test", &runs_per_test,
"Run each test the given number of times, default 1"),
OPT_STRING('w', "workload", &workload, "work", "workload to run for testing, use '--list-workloads' to list the available ones."),
+ OPT_STRING(0, "workload-ctl", &workload_control, "fifo:ctl-fifo[,ack-fifo]",
+ "Write enable to the fifo just before running the workload and disable after, with optional ack from ack-fifo"),
OPT_BOOLEAN(0, "list-workloads", &list_workloads, "List the available builtin workloads to use with -w/--workload"),
OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
OPT_STRING(0, "objdump", &test_objdump_path, "path",
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 03/19] perf test: Add a workload that forces context switches
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
2026-06-03 10:17 ` [PATCH v3 02/19] perf test: Add workload-ctl option James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:28 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 04/19] perf test cs-etm: Test process attribution James Clark
` (15 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
This workload launches two processes that block when reading and writing
to each other forcing the other process to be scheduled for each
read/write pair.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-test.txt | 7 +-
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 1 +
tools/perf/tests/workloads/context_switch_loop.c | 101 +++++++++++++++++++++++
5 files changed, 108 insertions(+), 3 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 1faf30d4a7be..9c0d7ac2bc64 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -55,15 +55,16 @@ OPTIONS
-w::
--workload=::
- Run a built-in workload, to list them use '--list-workloads', current ones include:
- noploop, thloop, leafloop, sqrtloop, brstack, datasym and landlock.
+ Run a built-in workload, to list them use '--list-workloads', current
+ ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
+ context_switch_loop and landlock.
Used with the shell script regression tests.
Some accept an extra parameter:
seconds: leafloop, noploop, sqrtloop, thloop
- nrloops: brstack
+ nrloops: brstack, context_switch_loop
The datasym and landlock workloads don't accept any.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index a9e67d7da700..2830a431771f 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -156,6 +156,7 @@ static struct test_workload *workloads[] = {
&workload__landlock,
&workload__traploop,
&workload__inlineloop,
+ &workload__context_switch_loop,
#ifdef HAVE_RUST_SUPPORT
&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index ee00518bf36f..79f50bacfc94 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -242,6 +242,7 @@ DECLARE_WORKLOAD(datasym);
DECLARE_WORKLOAD(landlock);
DECLARE_WORKLOAD(traploop);
DECLARE_WORKLOAD(inlineloop);
+DECLARE_WORKLOAD(context_switch_loop);
#ifdef HAVE_RUST_SUPPORT
DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 2ef97f7affce..3bda6da04a35 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -9,6 +9,7 @@ perf-test-y += datasym.o
perf-test-y += landlock.o
perf-test-y += traploop.o
perf-test-y += inlineloop.o
+perf-test-y += context_switch_loop.o
ifeq ($(CONFIG_RUST_SUPPORT),y)
perf-test-y += code_with_type.o
diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c
new file mode 100644
index 000000000000..173d770ae619
--- /dev/null
+++ b/tools/perf/tests/workloads/context_switch_loop.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/compiler.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../tests.h"
+
+static int loops = 100;
+static char buf;
+int context_switch_loop_work = 1234;
+
+#define write_block(fd) \
+ do { \
+ if (write(fd, &buf, 1) <= 0) \
+ exit(1); \
+ } while (0)
+
+#define read_block(fd) \
+ do { \
+ if (read(fd, &buf, 1) <= 0) \
+ exit(1); \
+ } while (0)
+
+/* Not static to avoid LTO clobbering the function name */
+void context_switch_loop_proc1(int in_fd, int out_fd);
+noinline void context_switch_loop_proc1(int in_fd, int out_fd)
+{
+ for (int i = 0; i < loops; i++) {
+ read_block(in_fd);
+ context_switch_loop_work += i * 3;
+ write_block(out_fd);
+ }
+}
+
+void context_switch_loop_proc2(int in_fd, int out_fd);
+noinline void context_switch_loop_proc2(int in_fd, int out_fd)
+{
+ for (int i = 0; i < loops; i++) {
+ write_block(out_fd);
+ context_switch_loop_work += i * 7;
+ read_block(in_fd);
+ }
+}
+
+/*
+ * Launches two processes that take turns to execute a multiplication N times
+ */
+static int context_switch_loop(int argc, const char **argv)
+{
+ int a_to_b[2], b_to_a[2];
+ pid_t proc1_pid;
+ int status;
+
+ if (argc > 0) {
+ loops = atoi(argv[0]);
+ if (loops < 0) {
+ fprintf(stderr, "Invalid number of loops: %s\n", argv[0]);
+ return 1;
+ }
+ }
+
+ if (pipe(a_to_b) || pipe(b_to_a)) {
+ perror("Pipe error");
+ return 1;
+ }
+
+ proc1_pid = fork();
+ if (proc1_pid < 0) {
+ perror("Fork error");
+ return 1;
+ }
+
+ if (!proc1_pid) {
+ close(a_to_b[0]);
+ close(b_to_a[1]);
+ prctl(PR_SET_NAME, "proc1", 0, 0, 0);
+ context_switch_loop_proc1(b_to_a[0], a_to_b[1]);
+ close(a_to_b[1]);
+ close(b_to_a[0]);
+ exit(0);
+ }
+
+ close(a_to_b[1]);
+ close(b_to_a[0]);
+ prctl(PR_SET_NAME, "proc2", 0, 0, 0);
+ context_switch_loop_proc2(a_to_b[0], b_to_a[1]);
+ close(a_to_b[0]);
+ close(b_to_a[1]);
+
+ if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) ||
+ WEXITSTATUS(status))
+ return 1;
+
+ return 0;
+}
+
+DEFINE_WORKLOAD(context_switch_loop);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 04/19] perf test cs-etm: Test process attribution
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (2 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 03/19] perf test: Add a workload that forces context switches James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 05/19] perf test: Add deterministic workload James Clark
` (14 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Run the context switch workload on one CPU and trace it to test that
symbols are attributed to the correct process and that the attribution
changes at the exact point that the context switch happened.
Signed-off-by: James Clark <james.clark@linaro.org>
---
.../tests/shell/coresight/context_switch_thread.sh | 69 ++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/tools/perf/tests/shell/coresight/context_switch_thread.sh b/tools/perf/tests/shell/coresight/context_switch_thread.sh
new file mode 100755
index 000000000000..0992c35a329d
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/context_switch_thread.sh
@@ -0,0 +1,69 @@
+#!/bin/bash -e
+# Coresight context switch thread attribution (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+if [ "$(id -u)" != 0 ]; then
+ # Requires root for "-C 0" in record command
+ echo "[Skip] No root permission"
+ exit 2
+fi
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_context_switch.XXXXX)
+
+cleanup() {
+ rm -rf "${tmpdir}"
+ trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+ cleanup
+ exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+check_samples() {
+ owner_samples=$(grep -c "proc1.*context_switch_loop_proc1" "$tmpdir/script" || true)
+ next_samples=$(grep -c "proc2.*context_switch_loop_proc2" "$tmpdir/script" || true)
+
+ if [ "$owner_samples" -eq 0 ] || [ "$next_samples" -eq 0 ]; then
+ echo "No samples found"
+ cleanup
+ exit 1
+ fi
+
+ if grep "proc2.*context_switch_loop_proc1" "$tmpdir/script"; then
+ echo "Thread1 symbol was attributed to proc2"
+ cleanup
+ exit 1
+ fi
+
+ if grep "proc1.*context_switch_loop_proc2" "$tmpdir/script"; then
+ echo "Thread2 symbol was attributed to proc1"
+ cleanup
+ exit 1
+ fi
+}
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Pin to one CPU so the two threads alternate running but record into the same
+# trace buffer. Start disabled and use the control FIFO to only record the
+# workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -C 0 -D -1 --control fifo:"$cf","$af" -- \
+ taskset --cpu-list 0 perf test --workload-ctl fifo:"$cf","$af" \
+ -w context_switch_loop > /dev/null 2>&1
+
+# Test both instruction and branch sample generation modes.
+perf script -i "$tmpdir/data" --itrace=i4 -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+perf script -i "$tmpdir/data" --itrace=b -F comm,pid,tid,ip,sym > "$tmpdir/script" 2>/dev/null
+check_samples
+
+cleanup
+exit 0
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 05/19] perf test: Add deterministic workload
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (3 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 04/19] perf test cs-etm: Test process attribution James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:24 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
` (13 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Add a workload that does the same thing every time for testing CPU trace
decoding.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-test.txt | 4 +--
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 2 ++
tools/perf/tests/workloads/deterministic.c | 39 ++++++++++++++++++++++++++++++
5 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 9c0d7ac2bc64..7ec70c054cac 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
--workload=::
Run a built-in workload, to list them use '--list-workloads', current
ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
- context_switch_loop and landlock.
+ context_switch_loop, deterministic and landlock.
Used with the shell script regression tests.
@@ -66,7 +66,7 @@ OPTIONS
seconds: leafloop, noploop, sqrtloop, thloop
nrloops: brstack, context_switch_loop
- The datasym and landlock workloads don't accept any.
+ The datasym, landlock and deterministic workloads don't accept any.
--list-workloads::
List the available workloads to use with -w/--workload.
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 2830a431771f..5a2ab67cd85d 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -157,6 +157,7 @@ static struct test_workload *workloads[] = {
&workload__traploop,
&workload__inlineloop,
&workload__context_switch_loop,
+ &workload__deterministic,
#ifdef HAVE_RUST_SUPPORT
&workload__code_with_type,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 79f50bacfc94..f8bba2d68769 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -243,6 +243,7 @@ DECLARE_WORKLOAD(landlock);
DECLARE_WORKLOAD(traploop);
DECLARE_WORKLOAD(inlineloop);
DECLARE_WORKLOAD(context_switch_loop);
+DECLARE_WORKLOAD(deterministic);
#ifdef HAVE_RUST_SUPPORT
DECLARE_WORKLOAD(code_with_type);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index 3bda6da04a35..cca7ad354227 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -10,6 +10,7 @@ perf-test-y += landlock.o
perf-test-y += traploop.o
perf-test-y += inlineloop.o
perf-test-y += context_switch_loop.o
+perf-test-y += deterministic.o
ifeq ($(CONFIG_RUST_SUPPORT),y)
perf-test-y += code_with_type.o
@@ -22,3 +23,4 @@ CFLAGS_brstack.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_datasym.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_traploop.o = -g -O0 -fno-inline -U_FORTIFY_SOURCE
CFLAGS_inlineloop.o = -g -O2
+CFLAGS_deterministic.o = -g -O0 -U_FORTIFY_SOURCE
diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c
new file mode 100644
index 000000000000..3caea8564043
--- /dev/null
+++ b/tools/perf/tests/workloads/deterministic.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/compiler.h>
+#include "../tests.h"
+
+int dt_work = 1234;
+
+static noinline void function1(void)
+{
+ dt_work *= 7;
+ dt_work *= 7;
+ dt_work *= 7;
+}
+
+static noinline void function2(void)
+{
+ dt_work *= 7;
+ dt_work *= 7;
+ dt_work *= 7;
+}
+
+static int deterministic(int argc __maybe_unused,
+ const char **argv __maybe_unused)
+{
+ dt_work *= 7;
+ dt_work *= 7;
+ dt_work *= 7;
+
+ function1();
+
+ dt_work *= 7;
+ dt_work *= 7;
+ dt_work *= 7;
+
+ function2();
+
+ return 0;
+}
+
+DEFINE_WORKLOAD(deterministic);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (4 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 05/19] perf test: Add deterministic workload James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:30 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
` (12 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Testing a long sequence without branches seems like it would be better
as a decoder unit test, and this test doesn't test decoding either, so
it's not clear what bugs this is trying to catch.
The new deterministic workload has somewhat long sequences when built
unoptimized, and we can always increase them later if we want to. But
now we test that decoding always gives the same result for the same
sequence of code which we've never had before.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/coresight/deterministic.sh | 71 ++++++++++++++++++++++
.../tests/shell/coresight/unroll_loop_thread_10.sh | 22 -------
2 files changed, 71 insertions(+), 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/deterministic.sh b/tools/perf/tests/shell/coresight/deterministic.sh
new file mode 100755
index 000000000000..52e033fd6b82
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/deterministic.sh
@@ -0,0 +1,71 @@
+#!/bin/bash -e
+# Coresight deterministic workload decode (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_deterministic.XXXXX)
+
+cleanup() {
+ rm -rf "${tmpdir}"
+ trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+ cleanup
+ exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+# Start disabled and use the control FIFO to only record the workload and not
+# startup.
+perf record -o "$tmpdir/data" -e cs_etm//u -D -1 --control fifo:"$cf","$af" -- \
+ perf test --workload-ctl fifo:"$cf","$af" -w deterministic > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" --itrace=i1i -F ip,srcline | \
+ grep "deterministic.c" | uniq > "$tmpdir/script" 2>/dev/null
+
+
+# Remove open brace lines as they may not be hit depending on the compiler
+sed -i \
+ -e '/deterministic.c:8$/d' \
+ -e '/deterministic.c:15$/d' \
+ -e '/deterministic.c:23$/d' \
+ "$tmpdir/script"
+
+cat > "$tmpdir/expected" << EOF
+ deterministic.c:24
+ deterministic.c:25
+ deterministic.c:26
+ deterministic.c:28
+ deterministic.c:9
+ deterministic.c:10
+ deterministic.c:11
+ deterministic.c:12
+ deterministic.c:30
+ deterministic.c:31
+ deterministic.c:32
+ deterministic.c:34
+ deterministic.c:16
+ deterministic.c:17
+ deterministic.c:18
+ deterministic.c:19
+ deterministic.c:36
+ deterministic.c:37
+EOF
+
+if ! diff -q "$tmpdir/script" "$tmpdir/expected"; then
+ echo "FAIL: line numbers don't match expected: "
+ head -n 100 "$tmpdir/script"
+ cleanup
+ exit 1
+fi
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh b/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
deleted file mode 100755
index cb3e97a0a89f..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Unroll Loop Thread 10 (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="unroll_loop_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10"
-DATV="10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 07/19] perf test cs-etm: Remove asm_pure_loop test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (5 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
` (11 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
It's not obvious what this test is for so remove it. It's not a stress
test because it doesn't output lots of data and it's not a functional
test because it only looks for raw trace output. It seems to imply that
a program written in assembly influences whether trace would be
generated by the CPU or not, but the CPU doesn't know what language the
program is written in.
We already have lots of Coresight tests that test the full pipeline
including decoding, and in many more modes of operation than this one,
so if no trace was collected they will already fail leaving this one
redundant.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 ----------------------
1 file changed, 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop.sh b/tools/perf/tests/shell/coresight/asm_pure_loop.sh
deleted file mode 100755
index 0301904b9637..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / ASM Pure Loop (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="asm_pure_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS=""
-DATV="out"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (6 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 09/19] perf test: Add named_threads workload James Clark
` (10 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Like asm_pure_loop, this memcpy test only checks that 10 of each of a
few trace packet types occur after recording a lot of trace, which isn't
more specific than other existing Coresight tests.
Assume it was supposed to be a stress test for dumping and replace it
with one that doesn't require a custom binary and checks for a specific
amount of raw output. Don't bother checking for packets because the
other tests that test decoding will catch issues with malformed data.
This also adds coverage for exit snapshot mode which was missing.
Signed-off-by: James Clark <james.clark@linaro.org>
---
.../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 ----------
.../perf/tests/shell/coresight/raw_dump_stress.sh | 48 ++++++++++++++++++++++
2 files changed, 48 insertions(+), 22 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh b/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
deleted file mode 100755
index 1f765d69acc3..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread_16k_10.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Memcpy 16k 10 Threads (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="memcpy_thread"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="16 10 1"
-DATV="16k_10"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-
-perf record $PERFRECOPT -o "$DATA" "$BIN" $ARGS
-
-perf_dump_aux_verify "$DATA" 10 10 10
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/raw_dump_stress.sh b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
new file mode 100755
index 000000000000..025584472513
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/raw_dump_stress.sh
@@ -0,0 +1,48 @@
+#!/bin/bash -e
+# Coresight raw dump stress (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+if [ "$(id -u)" != 0 ]; then
+ # Requires root for larger buffer size
+ echo "[Skip] No root permission"
+ exit 2
+fi
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_raw_dump_stress.XXXXX)
+
+cleanup() {
+ rm -r "${tmpdir}"
+ trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+ cleanup
+ exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+# Use exit snapshot to record 2M of trace to make about 80MB of raw dump data.
+echo "Recording..."
+perf record -e cs_etm/timestamp=0/u -m,2M -Se -o "$tmpdir/data" -- \
+ perf test -w brstack 20000 > /dev/null 2>&1
+
+# Test raw dump runs to completion but don't decode because that's too slow for
+# a test
+echo "Dumping raw trace..."
+perf report --dump-raw-trace -i "$tmpdir/data" 2>/dev/null > "$tmpdir/rawdump"
+err=$?
+
+size=$(stat -c%s "$tmpdir/rawdump")
+if [ $size -gt $((50 * 1024 * 1024)) ]; then
+ echo "PASS: Raw dump file is larger than 50MB"
+ cleanup
+ exit 0
+fi
+
+echo "FAIL: Got less than 50MB (${size} bytes)"
+cleanup
+exit 1
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 09/19] perf test: Add named_threads workload
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (7 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:26 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
` (9 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Add a workload that runs X threads that run a unique function named
"named_threads_thread[x]" which performs a multiplication in a loop for
Y loops. Each thread sets its name to "thread[x]".
This can be used to test that processor trace decoding handles
concurrent threads correctly and the correct symbols and thread names
are assigned to samples.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-test.txt | 5 +-
tools/perf/tests/builtin-test.c | 1 +
tools/perf/tests/tests.h | 1 +
tools/perf/tests/workloads/Build | 1 +
tools/perf/tests/workloads/named_threads.c | 109 +++++++++++++++++++++++++++++
5 files changed, 116 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-test.txt b/tools/perf/Documentation/perf-test.txt
index 7ec70c054cac..778c37f6efdb 100644
--- a/tools/perf/Documentation/perf-test.txt
+++ b/tools/perf/Documentation/perf-test.txt
@@ -57,7 +57,7 @@ OPTIONS
--workload=::
Run a built-in workload, to list them use '--list-workloads', current
ones include: noploop, thloop, leafloop, sqrtloop, brstack, datasym,
- context_switch_loop, deterministic and landlock.
+ context_switch_loop, deterministic, named_threads and landlock.
Used with the shell script regression tests.
@@ -66,6 +66,9 @@ OPTIONS
seconds: leafloop, noploop, sqrtloop, thloop
nrloops: brstack, context_switch_loop
+ 'named_threads' accepts the number of threads and the number of loops to
+ do in each thread.
+
The datasym, landlock and deterministic workloads don't accept any.
--list-workloads::
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 5a2ab67cd85d..2fee93858c86 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -149,6 +149,7 @@ static struct test_suite *generic_tests[] = {
static struct test_workload *workloads[] = {
&workload__noploop,
&workload__thloop,
+ &workload__named_threads,
&workload__leafloop,
&workload__sqrtloop,
&workload__brstack,
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index f8bba2d68769..ef3c3a269132 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -235,6 +235,7 @@ struct test_workload workload__##work = { \
/* The list of test workloads */
DECLARE_WORKLOAD(noploop);
DECLARE_WORKLOAD(thloop);
+DECLARE_WORKLOAD(named_threads);
DECLARE_WORKLOAD(leafloop);
DECLARE_WORKLOAD(sqrtloop);
DECLARE_WORKLOAD(brstack);
diff --git a/tools/perf/tests/workloads/Build b/tools/perf/tests/workloads/Build
index cca7ad354227..7db5eea713a3 100644
--- a/tools/perf/tests/workloads/Build
+++ b/tools/perf/tests/workloads/Build
@@ -2,6 +2,7 @@
perf-test-y += noploop.o
perf-test-y += thloop.o
+perf-test-y += named_threads.o
perf-test-y += leafloop.o
perf-test-y += sqrtloop.o
perf-test-y += brstack.o
diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
new file mode 100644
index 000000000000..dc8070a98df4
--- /dev/null
+++ b/tools/perf/tests/workloads/named_threads.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <errno.h>
+#include <limits.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <linux/compiler.h>
+#include "../tests.h"
+
+#define MAX_THREADS 25
+
+static int iterations = 500;
+int named_threads_work = 1234;
+
+typedef void *(*thread_fn_t)(void *);
+
+#define DEFINE_THREAD(n) \
+noinline void *named_threads_thread##n(void *arg __maybe_unused) \
+{ \
+ pthread_setname_np(pthread_self(), "thread" #n); \
+ for (int i = 0; i < iterations; i++) \
+ named_threads_work *= 3; \
+ \
+ return NULL; \
+}
+
+#define THREAD_LIST(macro) \
+ macro(1) \
+ macro(2) \
+ macro(3) \
+ macro(4) \
+ macro(5) \
+ macro(6) \
+ macro(7) \
+ macro(8) \
+ macro(9) \
+ macro(10) \
+ macro(11) \
+ macro(12) \
+ macro(13) \
+ macro(14) \
+ macro(15) \
+ macro(16) \
+ macro(17) \
+ macro(18) \
+ macro(19) \
+ macro(20) \
+ macro(21) \
+ macro(22) \
+ macro(23) \
+ macro(24) \
+ macro(25)
+
+#define DECLARE_THREAD(n) void *named_threads_thread##n(void *arg);
+
+THREAD_LIST(DECLARE_THREAD)
+THREAD_LIST(DEFINE_THREAD)
+
+#define THREAD_ENTRY(n) named_threads_thread##n,
+
+static thread_fn_t thread_fns[MAX_THREADS] = {
+ THREAD_LIST(THREAD_ENTRY)
+};
+
+/*
+ * Creates argv[0] threads that run a unique function named "thread[x]" which performs
+ * a multiplication in a loop for argv[1] loops.
+ */
+static int named_threads(int argc, const char **argv)
+{
+ pthread_t threads[MAX_THREADS];
+ int nr_threads = 1;
+ int err = 0;
+
+ if (argc > 0)
+ nr_threads = atoi(argv[0]);
+
+ if (nr_threads <= 0 || nr_threads > MAX_THREADS) {
+ fprintf(stderr, "Error: num threads must be 1 - %d\n", MAX_THREADS);
+ return 1;
+ }
+
+ if (argc > 1)
+ iterations = atoi(argv[1]);
+
+ if (iterations < 0) {
+ fprintf(stderr, "Error: iterations must be non-negative\n");
+ return 1;
+ }
+
+ for (int i = 0; i < nr_threads; i++) {
+ int ret;
+
+ ret = pthread_create(&threads[i], NULL, thread_fns[i], NULL);
+ if (ret) {
+ fprintf(stderr, "Error: failed to create thread%d: %s\n",
+ i + 1, strerror(ret));
+ return 1;
+ }
+ }
+
+ for (int i = 0; i < nr_threads; i++)
+ pthread_join(threads[i], NULL);
+
+ return err;
+}
+
+DEFINE_WORKLOAD(named_threads);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 10/19] perf test cs-etm: Test decoding for concurrent threads test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (8 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 09/19] perf test: Add named_threads workload James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
` (8 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
The thread_loop test only looks for context IDs in the raw trace.
There's a lot more that can go wrong when decoding these, so replace it
with a test that looks at the final output for matching thread names and
symbols.
In the future we might use timestamps and context switch events to track
threads, so looking at context IDs in the raw trace wouldn't always
work.
Signed-off-by: James Clark <james.clark@linaro.org>
---
.../tests/shell/coresight/concurrent_threads.sh | 45 ++++++++++++++++++++++
.../shell/coresight/thread_loop_check_tid_10.sh | 23 -----------
.../shell/coresight/thread_loop_check_tid_2.sh | 23 -----------
3 files changed, 45 insertions(+), 46 deletions(-)
diff --git a/tools/perf/tests/shell/coresight/concurrent_threads.sh b/tools/perf/tests/shell/coresight/concurrent_threads.sh
new file mode 100755
index 000000000000..bf34d4ee77a6
--- /dev/null
+++ b/tools/perf/tests/shell/coresight/concurrent_threads.sh
@@ -0,0 +1,45 @@
+#!/bin/bash -e
+# Coresight concurrent threads (exclusive)
+
+# SPDX-License-Identifier: GPL-2.0
+
+# If Coresight is not available, skip the test
+perf list pmu | grep -q cs_etm || exit 2
+
+tmpdir=$(mktemp -d /tmp/__perf_test.coresight_concurrent_threads.XXXXX)
+
+cleanup() {
+ rm -rf "${tmpdir}"
+ trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+ cleanup
+ exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
+cf="$tmpdir/ctl"
+af="$tmpdir/ack"
+mkfifo "$cf" "$af"
+
+nthreads=10
+
+# Timestamps off to reduce trace size, start disabled and use the control FIFO
+# to only record the workload and not startup.
+perf record -o "$tmpdir/data" -e cs_etm/timestamp=0/u -D -1 --control fifo:"$cf","$af" \
+ -- perf test --workload-ctl fifo:"$cf","$af" -w named_threads $nthreads 1 > /dev/null 2>&1
+
+perf script -i "$tmpdir/data" > "$tmpdir/script" 2>/dev/null
+
+# Check all threads were traced and they have the correct thread name and symbol
+for i in $(seq 1 $nthreads); do
+ if ! grep -q "thread${i} .* named_threads_thread${i}" "$tmpdir/script"; then
+ echo "Error: thread${i} missing" >&2
+ cleanup
+ exit 1
+ fi
+done
+
+cleanup
+exit 0
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
deleted file mode 100755
index 7f43a93a2ac2..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_10.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 10 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="10 1"
-DATV="check-tid-10th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err
diff --git a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh b/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
deleted file mode 100755
index a94d2079ed06..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop_check_tid_2.sh
+++ /dev/null
@@ -1,23 +0,0 @@
-#!/bin/bash -e
-# CoreSight / Thread Loop 2 Threads - Check TID (exclusive)
-
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-TEST="thread_loop"
-
-# shellcheck source=../lib/coresight.sh
-. "$(dirname $0)"/../lib/coresight.sh
-
-ARGS="2 20"
-DATV="check-tid-2th"
-# shellcheck disable=SC2153
-DATA="$DATD/perf-$TEST-$DATV.data"
-STDO="$DATD/perf-$TEST-$DATV.stdout"
-
-SHOW_TID=1 perf record -s $PERFRECOPT -o "$DATA" "$BIN" $ARGS > $STDO
-
-perf_dump_aux_tid_verify "$DATA" "$STDO"
-
-err=$?
-exit $err
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 11/19] perf test cs-etm: Remove duplicate branch tests
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (9 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 12/19] perf test cs-etm: Skip if not root James Clark
` (7 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
We already test branch output in perf script mode, but then retest it in
Perf report mode. This is more of a test of Perf itself than Coresight
because Perf uses the same samples to generate both outputs. Also we're
already testing instruction output in Perf report mode.
Remove this test for a speedup. On the systemwide test also remove the
Perf report test because systemwide mode records a lot more data so
running multiple tests on it has a big runtime impact.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index bbf89e944e7b..39553702c1f3 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -52,17 +52,6 @@ perf_script_branch_samples() {
grep -E " +$1 +[0-9]+ .* +branches:(.*:)? +" > /dev/null 2>&1
}
-perf_report_branch_samples() {
- echo "Looking at perf.data file for reporting branch samples:"
-
- # Below is an example of the branch samples reporting:
- # 73.04% 73.04% touch libc-2.27.so [.] _dl_addr
- # 7.71% 7.71% touch libc-2.27.so [.] getenv
- # 2.59% 2.59% touch ld-2.27.so [.] strcmp
- perf report --stdio -i ${perfdata} 2>&1 | \
- grep -E " +[0-9]+\.[0-9]+% +[0-9]+\.[0-9]+% +$1 " > /dev/null 2>&1
-}
-
perf_report_instruction_samples() {
echo "Looking at perf.data file for instruction samples:"
@@ -123,7 +112,6 @@ arm_cs_iterate_devices() {
record_touch_file $device_name $2 &&
perf_script_branch_samples touch &&
- perf_report_branch_samples touch &&
perf_report_instruction_samples touch
err=$?
@@ -154,9 +142,7 @@ arm_cs_etm_system_wide_test() {
# System-wide mode should include perf samples so test for that
# instead of ls
- perf_script_branch_samples perf &&
- perf_report_branch_samples perf &&
- perf_report_instruction_samples perf
+ perf_script_branch_samples perf
err=$?
arm_cs_report "CoreSight system wide testing" $err
@@ -179,7 +165,6 @@ arm_cs_etm_snapshot_test() {
wait $PERFPID
perf_script_branch_samples dd &&
- perf_report_branch_samples dd &&
perf_report_instruction_samples dd
err=$?
@@ -191,7 +176,6 @@ arm_cs_etm_basic_test() {
perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
perf_script_branch_samples ls &&
- perf_report_branch_samples ls &&
perf_report_instruction_samples ls
err=$?
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 12/19] perf test cs-etm: Skip if not root
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (10 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 13/19] perf test cs-etm: Reduce snapshot size James Clark
` (6 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Use the common idiom for skipping tests if not running as root, which is
required for this test.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 39553702c1f3..8ed2c934c87d 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -20,6 +20,12 @@ skip_if_no_cs_etm_event() {
skip_if_no_cs_etm_event || exit 2
+if [ "$(id -u)" != 0 ]; then
+ # Requires root for -C and system wide tests
+ echo "[Skip] No root permission"
+ exit 2
+fi
+
perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
file=$(mktemp /tmp/temporary_file.XXXXX)
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 13/19] perf test cs-etm: Reduce snapshot size
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (11 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 12/19] perf test cs-etm: Skip if not root James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 14/19] perf test cs-etm: Speed up basic test James Clark
` (5 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
The default buffer size for root is 4MB which is very slow to decode. We
only need a few KB to verify that the dd process is hit so reduce the
size to 128KB.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index 8ed2c934c87d..da2f599393e2 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -156,7 +156,7 @@ arm_cs_etm_system_wide_test() {
arm_cs_etm_snapshot_test() {
echo "Recording trace with snapshot mode"
- perf record -o ${perfdata} -e cs_etm// -S \
+ perf record -o ${perfdata} -e cs_etm// -S -m,128K \
-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
PERFPID=$!
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 14/19] perf test cs-etm: Speed up basic test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (12 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 13/19] perf test cs-etm: Reduce snapshot size James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
` (4 subsequent siblings)
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Like the name says, this should be the most basic test possible. Kernel
recording is slow and already has coverage on the systemwide test. Perf
report output also has coverage elsewhere. 'ls' also produces more trace
than 'true'.
We only want to test if the combination of recording options works at
all, so fix all of these things to make it as fast as possible.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight.sh | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/test_arm_coresight.sh
index da2f599393e2..83295a8fe179 100755
--- a/tools/perf/tests/shell/test_arm_coresight.sh
+++ b/tools/perf/tests/shell/test_arm_coresight.sh
@@ -179,10 +179,9 @@ arm_cs_etm_snapshot_test() {
arm_cs_etm_basic_test() {
echo "Recording trace with '$*'"
- perf record -o ${perfdata} "$@" -m,8M -- ls > /dev/null 2>&1
+ perf record -o ${perfdata} "$@" -- true > /dev/null 2>&1
- perf_script_branch_samples ls &&
- perf_report_instruction_samples ls
+ perf_script_branch_samples true
err=$?
arm_cs_report "CoreSight basic testing with '$*'" $err
@@ -246,12 +245,12 @@ arm_cs_etm_snapshot_test
# Test all combinations of per-thread, system-wide and normal mode with
# and without timestamps
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ --per-thread
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/ -a
-arm_cs_etm_basic_test -e cs_etm/timestamp=0/
-arm_cs_etm_basic_test -e cs_etm/timestamp=1/
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u --per-thread
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u -a
+arm_cs_etm_basic_test -e cs_etm/timestamp=0/u
+arm_cs_etm_basic_test -e cs_etm/timestamp=1/u
arm_cs_etm_sparse_cpus_test
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (13 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 14/19] perf test cs-etm: Speed up basic test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:41 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
` (3 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
These are now unused and had various issues like not working with out of
source builds and being slow to compile. Delete them.
Signed-off-by: James Clark <james.clark@linaro.org>
---
Documentation/trace/coresight/coresight-perf.rst | 78 +-----------
MAINTAINERS | 1 -
tools/perf/Makefile.perf | 14 +--
tools/perf/tests/shell/coresight/Makefile | 29 -----
.../perf/tests/shell/coresight/Makefile.miniconfig | 14 ---
.../tests/shell/coresight/asm_pure_loop/.gitignore | 1 -
.../tests/shell/coresight/asm_pure_loop/Makefile | 34 ------
.../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 -----
.../tests/shell/coresight/memcpy_thread/.gitignore | 1 -
.../tests/shell/coresight/memcpy_thread/Makefile | 33 -----
.../shell/coresight/memcpy_thread/memcpy_thread.c | 80 ------------
.../tests/shell/coresight/thread_loop/.gitignore | 1 -
.../tests/shell/coresight/thread_loop/Makefile | 33 -----
.../shell/coresight/thread_loop/thread_loop.c | 85 -------------
.../shell/coresight/unroll_loop_thread/.gitignore | 1 -
.../shell/coresight/unroll_loop_thread/Makefile | 33 -----
.../unroll_loop_thread/unroll_loop_thread.c | 75 ------------
tools/perf/tests/shell/lib/coresight.sh | 134 ---------------------
18 files changed, 5 insertions(+), 672 deletions(-)
diff --git a/Documentation/trace/coresight/coresight-perf.rst b/Documentation/trace/coresight/coresight-perf.rst
index 30be89320621..0a77741a431e 100644
--- a/Documentation/trace/coresight/coresight-perf.rst
+++ b/Documentation/trace/coresight/coresight-perf.rst
@@ -112,78 +112,6 @@ Example for triggering AUX pause and resume with PMU event::
Perf test - Verify kernel and userspace perf CoreSight work
-----------------------------------------------------------
-When you run perf test, it will do a lot of self tests. Some of those
-tests will cover CoreSight (only if enabled and on ARM64). You
-generally would run perf test from the tools/perf directory in the
-kernel tree. Some tests will check some internal perf support like:
-
- Check Arm CoreSight trace data recording and synthesized samples
- Check Arm SPE trace data recording and synthesized samples
-
-Some others will actually use perf record and some test binaries that
-are in tests/shell/coresight and will collect traces to ensure a
-minimum level of functionality is met. The scripts that launch these
-tests are in the same directory. These will all look like:
-
- CoreSight / ASM Pure Loop
- CoreSight / Memcpy 16k 10 Threads
- CoreSight / Thread Loop 10 Threads - Check TID
- etc.
-
-These perf record tests will not run if the tool binaries do not exist
-in tests/shell/coresight/\*/ and will be skipped. If you do not have
-CoreSight support in hardware then either do not build perf with
-CoreSight support or remove these binaries in order to not have these
-tests fail and have them skip instead.
-
-These tests will log historical results in the current working
-directory (e.g. tools/perf) and will be named stats-\*.csv like:
-
- stats-asm_pure_loop-out.csv
- stats-memcpy_thread-16k_10.csv
- ...
-
-These statistic files log some aspects of the AUX data sections in
-the perf data output counting some numbers of certain encodings (a
-good way to know that it's working in a very simple way). One problem
-with CoreSight is that given a large enough amount of data needing to
-be logged, some of it can be lost due to the processor not waking up
-in time to read out all the data from buffers etc.. You will notice
-that the amount of data collected can vary a lot per run of perf test.
-If you wish to see how this changes over time, simply run perf test
-multiple times and all these csv files will have more and more data
-appended to it that you can later examine, graph and otherwise use to
-figure out if things have become worse or better.
-
-This means sometimes these tests fail as they don't capture all the
-data needed. This is about tracking quality and amount of data
-produced over time and to see when changes to the Linux kernel improve
-quality of traces.
-
-Be aware that some of these tests take quite a while to run, specifically
-in processing the perf data file and dumping contents to then examine what
-is inside.
-
-You can change where these csv logs are stored by setting the
-PERF_TEST_CORESIGHT_STATDIR environment variable before running perf
-test like::
-
- export PERF_TEST_CORESIGHT_STATDIR=/var/tmp
- perf test
-
-They will also store resulting perf output data in the current
-directory for later inspection like::
-
- perf-asm_pure_loop-out.data
- perf-memcpy_thread-16k_10.data
- ...
-
-You can alter where the perf data files are stored by setting the
-PERF_TEST_CORESIGHT_DATADIR environment variable such as::
-
- PERF_TEST_CORESIGHT_DATADIR=/var/tmp
- perf test
-
-You may wish to set these above environment variables if you wish to
-keep the output of tests outside of the current working directory for
-longer term storage and examination.
+There are a set of Perf tests for CoreSight which can be run with::
+
+ sudo perf test coresight
diff --git a/MAINTAINERS b/MAINTAINERS
index b539be153f6a..7efb893edcbb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2751,7 +2751,6 @@ F: tools/perf/arch/arm/util/cs-etm.h
F: tools/perf/arch/arm/util/pmu.c
F: tools/perf/tests/shell/*coresight*
F: tools/perf/tests/shell/coresight/*
-F: tools/perf/tests/shell/lib/*coresight*
F: tools/perf/util/cs-etm-decoder/*
F: tools/perf/util/cs-etm.*
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 4ac2a0cec9ee..e4f8c979f47b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -508,16 +508,7 @@ arm64-sysreg-defs-clean:
$(Q)$(MAKE) -C $(arm64_gen_sysreg_dir) O=$(arm64_gen_sysreg_outdir) \
prefix= subdir= clean > /dev/null
-TESTS_CORESIGHT_DIR := $(srctree)/tools/perf/tests/shell/coresight
-
-tests-coresight-targets: FORCE
- $(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR)
-
-tests-coresight-targets-clean:
- $(call QUIET_CLEAN, coresight)
- $(Q)$(MAKE) -C $(TESTS_CORESIGHT_DIR) O=$(OUTPUT) clean >/dev/null
-
-all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS) tests-coresight-targets
+all: shell_compatibility_test $(ALL_PROGRAMS) $(LANG_BINDINGS) $(OTHER_PROGRAMS)
# Create python binding output directory if not already present
$(shell [ -d '$(OUTPUT)python' ] || mkdir -p '$(OUTPUT)python')
@@ -896,7 +887,6 @@ install-tests: all install-gtk
$(INSTALL) tests/shell/base_report/*.txt '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/base_report'; \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight' ; \
$(INSTALL) tests/shell/coresight/*.sh '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/tests/shell/coresight'
- $(Q)$(MAKE) -C tests/shell/coresight install-tests
install-bin: install-tools install-tests
@@ -939,7 +929,7 @@ endif
clean:: $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBSYMBOL)-clean $(LIBPERF)-clean \
arm64-sysreg-defs-clean fixdep-clean python-clean bpf-skel-clean \
- tests-coresight-targets-clean pmu-events-clean
+ pmu-events-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive \
$(OUTPUT)perf-iostat $(LANG_BINDINGS)
$(Q)find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '*.a' -delete -o \
diff --git a/tools/perf/tests/shell/coresight/Makefile b/tools/perf/tests/shell/coresight/Makefile
deleted file mode 100644
index fa08fd9a5991..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile
+++ /dev/null
@@ -1,29 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../../../../../tools/scripts/Makefile.include
-include ../../../../../tools/scripts/Makefile.arch
-include ../../../../../tools/scripts/utilities.mak
-
-SUBDIRS = \
- asm_pure_loop \
- memcpy_thread \
- thread_loop \
- unroll_loop_thread
-
-all: $(SUBDIRS)
-$(SUBDIRS):
- @$(MAKE) -C $@ >/dev/null
-
-INSTALLDIRS = $(SUBDIRS:%=install-%)
-
-install-tests: $(INSTALLDIRS)
-$(INSTALLDIRS):
- @$(MAKE) -C $(@:install-%=%) install-tests >/dev/null
-
-CLEANDIRS = $(SUBDIRS:%=clean-%)
-
-clean: $(CLEANDIRS)
-$(CLEANDIRS):
- $(call QUIET_CLEAN, test-$(@:clean-%=%)) $(MAKE) -C $(@:clean-%=%) clean >/dev/null
-
-.PHONY: all clean $(SUBDIRS) $(CLEANDIRS) $(INSTALLDIRS)
diff --git a/tools/perf/tests/shell/coresight/Makefile.miniconfig b/tools/perf/tests/shell/coresight/Makefile.miniconfig
deleted file mode 100644
index 5f72a9cb43f3..000000000000
--- a/tools/perf/tests/shell/coresight/Makefile.miniconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-ifndef DESTDIR
-prefix ?= $(HOME)
-endif
-
-DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
-INSTALL = install
-INSTDIR_SUB = tests/shell/coresight
-
-include ../../../../../scripts/Makefile.include
-include ../../../../../scripts/Makefile.arch
-include ../../../../../scripts/utilities.mak
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore b/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
deleted file mode 100644
index 468673ac32e8..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-asm_pure_loop
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile b/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
deleted file mode 100644
index 206849e92bc9..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/Makefile
+++ /dev/null
@@ -1,34 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=asm_pure_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=
-
-all: $(BIN)
-
-$(BIN): $(BIN).S
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line - this is raw asm with no libc to have an always exact binary
- $(Q)$(CC) $(BIN).S -nostdlib -static -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
- $(call QUIET_INSTALL, tests) \
- $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
- $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
- $(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S b/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
deleted file mode 100644
index 577760046772..000000000000
--- a/tools/perf/tests/shell/coresight/asm_pure_loop/asm_pure_loop.S
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Tamas Zsoldos <tamas.zsoldos@arm.com>, 2021 */
-
-.globl _start
-_start:
- mov x0, 0x0000ffff
- mov x1, xzr
-loop:
- nop
- nop
- cbnz x1, noskip
- nop
- nop
- adrp x2, skip
- add x2, x2, :lo12:skip
- br x2
- nop
- nop
-noskip:
- nop
- nop
-skip:
- sub x0, x0, 1
- cbnz x0, loop
-
- mov x0, #0
- mov x8, #93 // __NR_exit syscall
- svc #0
-
-.section .note.GNU-stack, "", @progbits
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore b/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
deleted file mode 100644
index f8217e56091e..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-memcpy_thread
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile b/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
deleted file mode 100644
index 2db637eb2c26..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=memcpy_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
- $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
- $(call QUIET_INSTALL, tests) \
- $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
- $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
- $(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c b/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
deleted file mode 100644
index 7e879217be30..000000000000
--- a/tools/perf/tests/shell/coresight/memcpy_thread/memcpy_thread.c
+++ /dev/null
@@ -1,80 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
- unsigned long loops;
- unsigned long size;
- pthread_t th;
- void *ret;
-};
-
-static void *thrfn(void *arg)
-{
- struct args *a = arg;
- unsigned long i, len = a->loops;
- unsigned char *src, *dst;
-
- src = malloc(a->size * 1024);
- dst = malloc(a->size * 1024);
- if ((!src) || (!dst)) {
- printf("ERR: Can't allocate memory\n");
- exit(1);
- }
- for (i = 0; i < len; i++)
- memcpy(dst, src, a->size * 1024);
-
- return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
- pthread_t t;
- pthread_attr_t attr;
-
- pthread_attr_init(&attr);
- pthread_create(&t, &attr, fn, arg);
- return t;
-}
-
-int main(int argc, char **argv)
-{
- unsigned long i, len, size, thr;
- struct args args[256];
- long long v;
-
- if (argc < 4) {
- printf("ERR: %s [copysize Kb] [numthreads] [numloops (hundreds)]\n", argv[0]);
- exit(1);
- }
-
- v = atoll(argv[1]);
- if ((v < 1) || (v > (1024 * 1024))) {
- printf("ERR: max memory 1GB (1048576 KB)\n");
- exit(1);
- }
- size = v;
- thr = atol(argv[2]);
- if ((thr < 1) || (thr > 256)) {
- printf("ERR: threads 1-256\n");
- exit(1);
- }
- v = atoll(argv[3]);
- if ((v < 1) || (v > 40000000000ll)) {
- printf("ERR: loops 1-40000000000 (hundreds)\n");
- exit(1);
- }
- len = v * 100;
- for (i = 0; i < thr; i++) {
- args[i].loops = len;
- args[i].size = size;
- args[i].th = new_thr(thrfn, &(args[i]));
- }
- for (i = 0; i < thr; i++)
- pthread_join(args[i].th, &(args[i].ret));
- return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/thread_loop/.gitignore b/tools/perf/tests/shell/coresight/thread_loop/.gitignore
deleted file mode 100644
index 6d4c33eaa9e8..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-thread_loop
diff --git a/tools/perf/tests/shell/coresight/thread_loop/Makefile b/tools/perf/tests/shell/coresight/thread_loop/Makefile
deleted file mode 100644
index ea846c038e7a..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=thread_loop
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
- $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
- $(call QUIET_INSTALL, tests) \
- $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
- $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
- $(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c b/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
deleted file mode 100644
index 86f3f548b006..000000000000
--- a/tools/perf/tests/shell/coresight/thread_loop/thread_loop.c
+++ /dev/null
@@ -1,85 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-// define this for gettid()
-#define _GNU_SOURCE
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-#include <sys/syscall.h>
-#ifndef SYS_gettid
-// gettid is 178 on arm64
-# define SYS_gettid 178
-#endif
-#define gettid() syscall(SYS_gettid)
-
-struct args {
- unsigned int loops;
- pthread_t th;
- void *ret;
-};
-
-static void *thrfn(void *arg)
-{
- struct args *a = arg;
- int i = 0, len = a->loops;
-
- if (getenv("SHOW_TID")) {
- unsigned long long tid = gettid();
-
- printf("%llu\n", tid);
- }
- asm volatile(
- "loop:\n"
- "add %w[i], %w[i], #1\n"
- "cmp %w[i], %w[len]\n"
- "blt loop\n"
- : /* out */
- : /* in */ [i] "r" (i), [len] "r" (len)
- : /* clobber */
- );
- return (void *)(long)i;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
- pthread_t t;
- pthread_attr_t attr;
-
- pthread_attr_init(&attr);
- pthread_create(&t, &attr, fn, arg);
- return t;
-}
-
-int main(int argc, char **argv)
-{
- unsigned int i, len, thr;
- struct args args[256];
-
- if (argc < 3) {
- printf("ERR: %s [numthreads] [numloops (millions)]\n", argv[0]);
- exit(1);
- }
-
- thr = atoi(argv[1]);
- if ((thr < 1) || (thr > 256)) {
- printf("ERR: threads 1-256\n");
- exit(1);
- }
- len = atoi(argv[2]);
- if ((len < 1) || (len > 4000)) {
- printf("ERR: max loops 4000 (millions)\n");
- exit(1);
- }
- len *= 1000000;
- for (i = 0; i < thr; i++) {
- args[i].loops = len;
- args[i].th = new_thr(thrfn, &(args[i]));
- }
- for (i = 0; i < thr; i++)
- pthread_join(args[i].th, &(args[i].ret));
- return 0;
-}
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore b/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
deleted file mode 100644
index 2cb4e996dbf3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-unroll_loop_thread
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile b/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
deleted file mode 100644
index 6264c4e3abd1..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/Makefile
+++ /dev/null
@@ -1,33 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-include ../Makefile.miniconfig
-
-# Binary to produce
-BIN=unroll_loop_thread
-# Any linking/libraries needed for the binary - empty if none needed
-LIB=-pthread
-
-all: $(BIN)
-
-$(BIN): $(BIN).c
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Build line
- $(Q)$(CC) $(BIN).c -o $(BIN) $(LIB)
-endif
-endif
-
-install-tests: all
-ifdef CORESIGHT
-ifeq ($(ARCH),arm64)
-# Install the test tool in the right place
- $(call QUIET_INSTALL, tests) \
- $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)'; \
- $(INSTALL) $(BIN) '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/$(INSTDIR_SUB)/$(BIN)/$(BIN)'
-endif
-endif
-
-clean:
- $(Q)$(RM) -f $(BIN)
-
-.PHONY: all clean install-tests
diff --git a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c b/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
deleted file mode 100644
index 8f4e1c985ca3..000000000000
--- a/tools/perf/tests/shell/coresight/unroll_loop_thread/unroll_loop_thread.c
+++ /dev/null
@@ -1,75 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <string.h>
-#include <pthread.h>
-
-struct args {
- pthread_t th;
- unsigned int in;
- void *ret;
-};
-
-static void *thrfn(void *arg)
-{
- struct args *a = arg;
- unsigned int i, in = a->in;
-
- for (i = 0; i < 10000; i++) {
- asm volatile (
-// force an unroll of thia add instruction so we can test long runs of code
-#define SNIP1 "add %w[in], %w[in], #1\n"
-// 10
-#define SNIP2 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1 SNIP1
-// 100
-#define SNIP3 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2 SNIP2
-// 1000
-#define SNIP4 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3 SNIP3
-// 10000
-#define SNIP5 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4 SNIP4
-// 100000
- SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5 SNIP5
- : /* out */
- : /* in */ [in] "r" (in)
- : /* clobber */
- );
- }
-
- return NULL;
-}
-
-static pthread_t new_thr(void *(*fn) (void *arg), void *arg)
-{
- pthread_t t;
- pthread_attr_t attr;
-
- pthread_attr_init(&attr);
- pthread_create(&t, &attr, fn, arg);
- return t;
-}
-
-int main(int argc, char **argv)
-{
- unsigned int i, thr;
- struct args args[256];
-
- if (argc < 2) {
- printf("ERR: %s [numthreads]\n", argv[0]);
- exit(1);
- }
-
- thr = atoi(argv[1]);
- if ((thr > 256) || (thr < 1)) {
- printf("ERR: threads 1-256\n");
- exit(1);
- }
- for (i = 0; i < thr; i++) {
- args[i].in = rand();
- args[i].th = new_thr(thrfn, &(args[i]));
- }
- for (i = 0; i < thr; i++)
- pthread_join(args[i].th, &(args[i].ret));
- return 0;
-}
diff --git a/tools/perf/tests/shell/lib/coresight.sh b/tools/perf/tests/shell/lib/coresight.sh
deleted file mode 100644
index 184d62e7e5bd..000000000000
--- a/tools/perf/tests/shell/lib/coresight.sh
+++ /dev/null
@@ -1,134 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Carsten Haitzler <carsten.haitzler@arm.com>, 2021
-
-# This is sourced from a driver script so no need for #!/bin... etc. at the
-# top - the assumption below is that it runs as part of sourcing after the
-# test sets up some basic env vars to say what it is.
-
-# This currently works with ETMv4 / ETF not any other packet types at thi
-# point. This will need changes if that changes.
-
-# perf record options for the perf tests to use
-PERFRECMEM="-m ,16M"
-PERFRECOPT="$PERFRECMEM -e cs_etm//u"
-
-TOOLS=$(dirname $0)
-DIR="$TOOLS/$TEST"
-BIN="$DIR/$TEST"
-# If the test tool/binary does not exist and is executable then skip the test
-if ! test -x "$BIN"; then exit 2; fi
-# If CoreSight is not available, skip the test
-perf list pmu | grep -q cs_etm || exit 2
-DATD="."
-# If the data dir env is set then make the data dir use that instead of ./
-if test -n "$PERF_TEST_CORESIGHT_DATADIR"; then
- DATD="$PERF_TEST_CORESIGHT_DATADIR";
-fi
-# If the stat dir env is set then make the data dir use that instead of ./
-STATD="."
-if test -n "$PERF_TEST_CORESIGHT_STATDIR"; then
- STATD="$PERF_TEST_CORESIGHT_STATDIR";
-fi
-
-# Called if the test fails - error code 1
-err() {
- echo "$1"
- exit 1
-}
-
-# Check that some statistics from our perf
-check_val_min() {
- STATF="$4"
- if test "$2" -lt "$3"; then
- echo ", FAILED" >> "$STATF"
- err "Sanity check number of $1 is too low ($2 < $3)"
- fi
-}
-
-perf_dump_aux_verify() {
- # Some basic checking that the AUX chunk contains some sensible data
- # to see that we are recording something and at least a minimum
- # amount of it. We should almost always see Fn packets in just about
- # anything but certainly we will see some trace info and async
- # packets
- DUMP="$DATD/perf-tmp-aux-dump.txt"
- perf report --stdio --dump -i "$1" | \
- grep -o -e I_ATOM_F -e I_ASYNC -e I_TRACE_INFO > "$DUMP"
- # Simply count how many of these packets we find to see that we are
- # producing a reasonable amount of data - exact checks are not sane
- # as this is a lossy process where we may lose some blocks and the
- # compiler may produce different code depending on the compiler and
- # optimization options, so this is rough just to see if we're
- # either missing almost all the data or all of it
- ATOM_FX_NUM=$(grep -c I_ATOM_F "$DUMP")
- ASYNC_NUM=$(grep -c I_ASYNC "$DUMP")
- TRACE_INFO_NUM=$(grep -c I_TRACE_INFO "$DUMP")
- rm -f "$DUMP"
-
- # Arguments provide minimums for a pass
- CHECK_FX_MIN="$2"
- CHECK_ASYNC_MIN="$3"
- CHECK_TRACE_INFO_MIN="$4"
-
- # Write out statistics, so over time you can track results to see if
- # there is a pattern - for example we have less "noisy" results that
- # produce more consistent amounts of data each run, to see if over
- # time any techinques to minimize data loss are having an effect or
- # not
- STATF="$STATD/stats-$TEST-$DATV.csv"
- if ! test -f "$STATF"; then
- echo "ATOM Fx Count, Minimum, ASYNC Count, Minimum, TRACE INFO Count, Minimum" > "$STATF"
- fi
- echo -n "$ATOM_FX_NUM, $CHECK_FX_MIN, $ASYNC_NUM, $CHECK_ASYNC_MIN, $TRACE_INFO_NUM, $CHECK_TRACE_INFO_MIN" >> "$STATF"
-
- # Actually check to see if we passed or failed.
- check_val_min "ATOM_FX" "$ATOM_FX_NUM" "$CHECK_FX_MIN" "$STATF"
- check_val_min "ASYNC" "$ASYNC_NUM" "$CHECK_ASYNC_MIN" "$STATF"
- check_val_min "TRACE_INFO" "$TRACE_INFO_NUM" "$CHECK_TRACE_INFO_MIN" "$STATF"
- echo ", Ok" >> "$STATF"
-}
-
-perf_dump_aux_tid_verify() {
- # Specifically crafted test will produce a list of Tread ID's to
- # stdout that need to be checked to see that they have had trace
- # info collected in AUX blocks in the perf data. This will go
- # through all the TID's that are listed as CID=0xabcdef and see
- # that all the Thread IDs the test tool reports are in the perf
- # data AUX chunks
-
- # The TID test tools will print a TID per stdout line that are being
- # tested
- TIDS=$(cat "$2")
- # Scan the perf report to find the TIDs that are actually CID in hex
- # and build a list of the ones found
- FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
- grep -o "CID=0x[0-9a-z]\+" | sed 's/CID=//g' | \
- uniq | sort | uniq)
- # No CID=xxx found - maybe your kernel is reporting these as
- # VMID=xxx so look there
- if test -z "$FOUND_TIDS"; then
- FOUND_TIDS=$(perf report --stdio --dump -i "$1" | \
- grep -o "VMID=0x[0-9a-z]\+" | sed 's/VMID=//g' | \
- uniq | sort | uniq)
- fi
-
- # Iterate over the list of TIDs that the test says it has and find
- # them in the TIDs found in the perf report
- MISSING=""
- for TID2 in $TIDS; do
- FOUND=""
- for TIDHEX in $FOUND_TIDS; do
- TID=$(printf "%i" $TIDHEX)
- if test "$TID" -eq "$TID2"; then
- FOUND="y"
- break
- fi
- done
- if test -z "$FOUND"; then
- MISSING="$MISSING $TID"
- fi
- done
- if test -n "$MISSING"; then
- err "Thread IDs $MISSING not found in perf AUX data"
- fi
-}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (14 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:36 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 17/19] perf test cs-etm: Add all branch instructions to test James Clark
` (2 subsequent siblings)
18 siblings, 1 reply; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
Hits in modules return empty disassembly with vmlinux as an input to
objdump. Make the disassembly test more reliable by always using kcore.
And update the comments to say that this is supported by the script.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 ++++++++++----------
tools/perf/tests/shell/test_arm_coresight_disasm.sh | 2 +-
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index ba208c90d631..8f6fa4a007b4 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -18,29 +18,29 @@ from perf_trace_context import perf_sample_srccode, perf_config_get
# Below are some example commands for using this script.
# Note a --kcore recording is required for accurate decode
-# due to the alternatives patching mechanism. However this
-# script only supports reading vmlinux for disassembly dump,
-# meaning that any patched instructions will appear
-# as unpatched, but the instruction ranges themselves will
-# be correct. In addition to this, source line info comes
-# from Perf, and when using kcore there is no debug info. The
-# following lists the supported features in each mode:
+# due to the alternatives patching mechanism. In addition to this,
+# source line info comes from Perf, and when using kcore there is
+# no debug info. The following lists the supported features in each mode:
#
# +-----------+-----------------+------------------+------------------+
# | Recording | Accurate decode | Source line dump | Disassembly dump |
# +-----------+-----------------+------------------+------------------+
# | --kcore | yes | no | yes |
-# | normal | no | yes | yes |
+# | normal | no | yes (inaccurate) | yes (inaccurate) |
# +-----------+-----------------+------------------+------------------+
#
# Output disassembly with objdump and auto detect vmlinux
-# (when running on same machine.)
+# (when running on same machine.):
# perf script -s scripts/python/arm-cs-trace-disasm.py -d
#
# Output disassembly with llvm-objdump:
# perf script -s scripts/python/arm-cs-trace-disasm.py \
# -- -d llvm-objdump-11 -k path/to/vmlinux
#
+# Output accurate disassembly by passing kcore to script:
+# perf script -s scripts/python/arm-cs-trace-disasm.py \
+# -- -d -k perf.data/kcore_dir/kcore
+#
# Output only source line and symbols:
# perf script -s scripts/python/arm-cs-trace-disasm.py
@@ -57,7 +57,7 @@ def int_arg(v):
args = argparse.ArgumentParser()
args.add_argument("-k", "--vmlinux",
- help="Set path to vmlinux file. Omit to autodetect if running on same machine")
+ help="Set path to vmlinux or kcore file. Omit to autodetect if running on same machine")
args.add_argument("-d", "--objdump", nargs="?", const=default_objdump(),
help="Show disassembly. Can also be used to change the objdump path"),
args.add_argument("-v", "--verbose", action="store_true", help="Enable debugging log")
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 0dfb4fadf531..8b5c60a09012 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -46,7 +46,7 @@ if [ -e /proc/kcore ]; then
echo "Testing kernel disassembly"
perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
perf script -i ${perfdata} -s python:${script_path} -- \
- -d --stop-sample=30 2> /dev/null > ${file}
+ -d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
grep -q -e ${branch_search} ${file}
echo "Found kernel branches"
else
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 17/19] perf test cs-etm: Add all branch instructions to test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (15 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 18/19] perf test cs-etm: Speed up disassembly test James Clark
2026-06-03 10:17 ` [PATCH v3 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
If we reduce the number of samples searched to speed up the test, then
there will be less chance of hitting one of these branches. Extend the
regex to cover all branches so the test will always pass.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight_disasm.sh | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 8b5c60a09012..5ee87eb7973e 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -38,8 +38,7 @@ cleanup_files()
trap cleanup_files EXIT TERM INT
# Ranges start and end on branches, so check for some likely branch instructions
-sep="\s\|\s"
-branch_search="\sbl${sep}b${sep}b.ne${sep}b.eq${sep}cbz\s"
+branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al))?|br|blr|ret|cbz|cbnz|tbz|tbnz|svc|eret)([[:space:]]|$)'
## Test kernel ##
if [ -e /proc/kcore ]; then
@@ -47,7 +46,7 @@ if [ -e /proc/kcore ]; then
perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
perf script -i ${perfdata} -s python:${script_path} -- \
-d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
- grep -q -e ${branch_search} ${file}
+ grep -q -E ${branch_search} ${file}
echo "Found kernel branches"
else
# kcore is required for correct kernel decode due to runtime code patching
@@ -59,7 +58,7 @@ echo "Testing userspace disassembly"
perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
perf script -i ${perfdata} -s python:${script_path} -- \
-d --stop-sample=30 2> /dev/null > ${file}
-grep -q -e ${branch_search} ${file}
+grep -q -E ${branch_search} ${file}
echo "Found userspace branches"
glb_err=0
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 18/19] perf test cs-etm: Speed up disassembly test
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (16 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 17/19] perf test cs-etm: Add all branch instructions to test James Clark
@ 2026-06-03 10:17 ` James Clark
2026-06-03 10:17 ` [PATCH v3 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
We can use exit snapshot to limit the amount of trace to decode here
too. Also each call to objdump is quite expensive on kcore so limit it
to 2 samples instead of 30. We only want to see if there is no data at
all.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_coresight_disasm.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
index 5ee87eb7973e..2a400fd38a36 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/test_arm_coresight_disasm.sh
@@ -43,9 +43,9 @@ branch_search='[[:space:]](bl|b(\.(eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|al)
## Test kernel ##
if [ -e /proc/kcore ]; then
echo "Testing kernel disassembly"
- perf record -o ${perfdata} -e cs_etm//k --kcore -- touch $file > /dev/null 2>&1
+ perf record -o ${perfdata} -e cs_etm//k --kcore -Se -m,64K -- touch $file > /dev/null 2>&1
perf script -i ${perfdata} -s python:${script_path} -- \
- -d --stop-sample=30 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
+ -d --stop-sample=2 -k ${perfdata}/kcore_dir/kcore 2> /dev/null > ${file}
grep -q -E ${branch_search} ${file}
echo "Found kernel branches"
else
@@ -55,9 +55,9 @@ fi
## Test user ##
echo "Testing userspace disassembly"
-perf record -o ${perfdata} -e cs_etm//u -- touch $file > /dev/null 2>&1
+perf record -o ${perfdata} -e cs_etm//u -Se -m,64K -- touch $file > /dev/null 2>&1
perf script -i ${perfdata} -s python:${script_path} -- \
- -d --stop-sample=30 2> /dev/null > ${file}
+ -d --stop-sample=2 2> /dev/null > ${file}
grep -q -E ${branch_search} ${file}
echo "Found userspace branches"
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v3 19/19] perf test cs-etm: Move existing tests to coresight folder
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
` (17 preceding siblings ...)
2026-06-03 10:17 ` [PATCH v3 18/19] perf test cs-etm: Speed up disassembly test James Clark
@ 2026-06-03 10:17 ` James Clark
18 siblings, 0 replies; 28+ messages in thread
From: James Clark @ 2026-06-03 10:17 UTC (permalink / raw)
To: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Amir Ayupov, Jonathan Corbet,
Shuah Khan, Paschalis Mpeis
Cc: coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc, James Clark
There is a subfolder for Coresight tests so might as well keep them all
in here.
Signed-off-by: James Clark <james.clark@linaro.org>
---
MAINTAINERS | 1 -
tools/perf/tests/shell/{ => coresight}/test_arm_coresight.sh | 0
tools/perf/tests/shell/{ => coresight}/test_arm_coresight_disasm.sh | 2 +-
3 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 7efb893edcbb..ff8935b459ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2749,7 +2749,6 @@ F: tools/perf/arch/arm/util/auxtrace.c
F: tools/perf/arch/arm/util/cs-etm.c
F: tools/perf/arch/arm/util/cs-etm.h
F: tools/perf/arch/arm/util/pmu.c
-F: tools/perf/tests/shell/*coresight*
F: tools/perf/tests/shell/coresight/*
F: tools/perf/util/cs-etm-decoder/*
F: tools/perf/util/cs-etm.*
diff --git a/tools/perf/tests/shell/test_arm_coresight.sh b/tools/perf/tests/shell/coresight/test_arm_coresight.sh
similarity index 100%
rename from tools/perf/tests/shell/test_arm_coresight.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight.sh
diff --git a/tools/perf/tests/shell/test_arm_coresight_disasm.sh b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
similarity index 96%
rename from tools/perf/tests/shell/test_arm_coresight_disasm.sh
rename to tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
index 2a400fd38a36..b196aab709f8 100755
--- a/tools/perf/tests/shell/test_arm_coresight_disasm.sh
+++ b/tools/perf/tests/shell/coresight/test_arm_coresight_disasm.sh
@@ -24,7 +24,7 @@ perfdata_dir=$(mktemp -d /tmp/__perf_test.perf.data.XXXXX)
perfdata=${perfdata_dir}/perf.data
file=$(mktemp /tmp/temporary_file.XXXXX)
# Relative path works whether it's installed or running from repo
-script_path=$(dirname "$0")/../../scripts/python/arm-cs-trace-disasm.py
+script_path=$(dirname "$0")/../../../scripts/python/arm-cs-trace-disasm.py
cleanup_files()
{
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v3 05/19] perf test: Add deterministic workload
2026-06-03 10:17 ` [PATCH v3 05/19] perf test: Add deterministic workload James Clark
@ 2026-06-03 10:24 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:24 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Signed integer overflow in 'dt_work' multiplications.
--
commit 89c17c87c140a5545531995cc6d206a8c6fffd71
Author: James Clark <james.clark@linaro.org>
perf test: Add deterministic workload
This commit adds a deterministic workload to the perf test suite for testing
CPU trace decoding. It introduces a new test file that performs repeated
multiplications on a global variable to ensure a predictable execution path.
> diff --git a/tools/perf/tests/workloads/deterministic.c b/tools/perf/tests/workloads/deterministic.c
> new file mode 100644
> index 0000000000000..3caea8564043c
> --- /dev/null
> +++ b/tools/perf/tests/workloads/deterministic.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/compiler.h>
> +#include "../tests.h"
> +
> +int dt_work = 1234;
[Severity: Medium]
Will this signed integer overflow during the repeated multiplications?
Since dt_work is a 32-bit signed integer initialized to 1234 and is multiplied
by 7 repeatedly across deterministic(), function1(), and function2(), the 8th
multiplication operation attempts to evaluate to 7,113,764,434.
This exceeds the maximum signed 32-bit integer limit of 2,147,483,647 and
triggers undefined behavior in C.
Could dt_work be declared as an unsigned int instead? If the perf tools are
compiled with Undefined Behavior Sanitizer enabled, this signed integer
overflow will cause the test workload to crash.
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=5
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 09/19] perf test: Add named_threads workload
2026-06-03 10:17 ` [PATCH v3 09/19] perf test: Add named_threads workload James Clark
@ 2026-06-03 10:26 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:26 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Data race and signed integer overflow on the global `named_threads_work` variable leading to Undefined Behavior.
- [Medium] Lack of synchronization barriers guarantees sequential execution instead of concurrent thread execution.
- [Medium] Background threads are leaked on the `pthread_create` error path.
--
commit d8bc49305dcacce453a6a52dcb27371073bfc5b0
Author: James Clark <james.clark@linaro.org>
perf test: Add named_threads workload
Add a workload that runs X threads executing a unique function to perform
multiplication in a loop. This is intended to test that processor trace
decoding handles concurrent threads correctly.
> diff --git a/tools/perf/tests/workloads/named_threads.c b/tools/perf/tests/workloads/named_threads.c
> new file mode 100644
> index 0000000000000..dc8070a98df4c
> --- /dev/null
> +++ b/tools/perf/tests/workloads/named_threads.c
> @@ -0,0 +1,109 @@
[ ... ]
> +static int iterations = 500;
> +int named_threads_work = 1234;
> +
> +typedef void *(*thread_fn_t)(void *);
> +
> +#define DEFINE_THREAD(n) \
> +noinline void *named_threads_thread##n(void *arg __maybe_unused) \
> +{ \
> + pthread_setname_np(pthread_self(), "thread" #n); \
> + for (int i = 0; i < iterations; i++) \
> + named_threads_work *= 3; \
[Severity: Medium]
Does this loop trigger undefined behavior due to data races and signed
integer overflow?
Multiple threads concurrently modify the global named_threads_work without
synchronization. Additionally, multiplying 1234 by 3 for 500 iterations
will overflow the signed integer.
Could modern C compilers optimize this loop out entirely due to the UB,
preventing the generation of the processor trace samples this test relies
on?
> + \
> + return NULL; \
> +}
[ ... ]
> +static int named_threads(int argc, const char **argv)
> +{
[ ... ]
> + for (int i = 0; i < nr_threads; i++) {
> + int ret;
> +
> + ret = pthread_create(&threads[i], NULL, thread_fns[i], NULL);
[Severity: Medium]
Will these threads actually run concurrently as intended?
The overhead of pthread_create is typically longer than the time it takes
for the thread to execute its workload, especially since the loop only
executes 500 simple multiplications.
Without a synchronization barrier (like pthread_barrier_wait), earlier
threads might complete their loops and exit before the main thread finishes
spawning the remaining threads, resulting in sequential execution rather
than overlapping concurrent tasks.
> + if (ret) {
> + fprintf(stderr, "Error: failed to create thread%d: %s\n",
> + i + 1, strerror(ret));
> + return 1;
[Severity: Medium]
Does this error path leak previously created background threads?
If pthread_create fails for any thread index greater than 0, the function
returns 1 immediately. It doesn't join the successfully created threads from
earlier iterations.
If perf test is run in non-forking mode (e.g., with --dont-fork), these
leaked background threads could persist and interfere with subsequent tests.
> + }
> + }
> +
> + for (int i = 0; i < nr_threads; i++)
> + pthread_join(threads[i], NULL);
> +
> + return err;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=9
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 03/19] perf test: Add a workload that forces context switches
2026-06-03 10:17 ` [PATCH v3 03/19] perf test: Add a workload that forces context switches James Clark
@ 2026-06-03 10:28 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:28 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Calling exit(1) on I/O failure in the parent process abruptly terminates the test wrapper and bypasses workload control cleanup.
--
commit 4bc96eb1d44f39f8252cd4bbe7f402600bc7ca26
Author: James Clark <james.clark@linaro.org>
perf test: Add a workload that forces context switches
This workload launches two processes that block when reading and writing
to each other forcing the other process to be scheduled for each
read/write pair.
> diff --git a/tools/perf/tests/workloads/context_switch_loop.c b/tools/perf/tests/workloads/context_switch_loop.c
> new file mode 100644
> index 0000000000000..173d770ae6192
> --- /dev/null
> +++ b/tools/perf/tests/workloads/context_switch_loop.c
> @@ -0,0 +1,101 @@
[ ... ]
> +#define write_block(fd) \
> + do { \
> + if (write(fd, &buf, 1) <= 0) \
> + exit(1); \
> + } while (0)
> +
> +#define read_block(fd) \
> + do { \
> + if (read(fd, &buf, 1) <= 0) \
> + exit(1); \
> + } while (0)
[ ... ]
> + if (!proc1_pid) {
> + close(a_to_b[0]);
> + close(b_to_a[1]);
> + prctl(PR_SET_NAME, "proc1", 0, 0, 0);
> + context_switch_loop_proc1(b_to_a[0], a_to_b[1]);
> + close(a_to_b[1]);
> + close(b_to_a[0]);
> + exit(0);
> + }
> +
> + close(a_to_b[1]);
> + close(b_to_a[0]);
> + prctl(PR_SET_NAME, "proc2", 0, 0, 0);
> + context_switch_loop_proc2(a_to_b[0], b_to_a[1]);
[Severity: Medium]
Since context_switch_loop_proc2() runs in the parent process, does calling
exit(1) on I/O failures abruptly terminate the entire perf test wrapper?
If write_block() or read_block() encounter an error like EINTR, they call
exit(1) directly. Because context_switch_loop_proc2() is executed in the main
process rather than a forked child, this bypasses the workload control cleanup
in run_workload().
Should the parent process return an error code instead of calling exit(1) so
perf test can clean up gracefully?
> + close(a_to_b[0]);
> + close(b_to_a[1]);
> +
> + if (waitpid(proc1_pid, &status, 0) != proc1_pid || !WIFEXITED(status) ||
> + WEXITSTATUS(status))
> + return 1;
> +
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=3
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test
2026-06-03 10:17 ` [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
@ 2026-06-03 10:30 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:30 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Signed integer overflow in `deterministic` workload causes Undefined Behavior.
--
commit 4589a67d2fd8e0ec516cb0f0c6ed8dd3e2079b76
Author: James Clark <james.clark@linaro.org>
perf test cs-etm: Replace unroll loop thread with deterministic decode test
This commit replaces the unroll loop thread test with a new deterministic
decode test. It introduces a shell script that records and validates the
deterministic workload to ensure decoding always yields the same result for
a given code sequence.
> diff --git a/tools/perf/tests/shell/coresight/deterministic.sh b/tools/perf/tests/shell/coresight/deterministic.sh
> new file mode 100755
> index 0000000000000..52e033fd6b82a
> --- /dev/null
> +++ b/tools/perf/tests/shell/coresight/deterministic.sh
> @@ -0,0 +1,71 @@
> +#!/bin/bash -e
[ ... ]
> +cf="$tmpdir/ctl"
> +af="$tmpdir/ack"
> +mkfifo "$cf" "$af"
> +
> +# Start disabled and use the control FIFO to only record the workload and not
> +# startup.
> +perf record -o "$tmpdir/data" -e cs_etm//u -D -1 --control fifo:"$cf","$af" -- \
> + perf test --workload-ctl fifo:"$cf","$af" -w deterministic > /dev/null 2>&1
[Severity: Medium]
Does executing the deterministic workload here cause a regression by
triggering undefined behavior?
Looking at the workload in tools/perf/tests/workloads/deterministic.c,
dt_work is initialized to 1234 and then multiplied by 7 twelve times
across deterministic(), function1(), and function2():
tools/perf/tests/workloads/deterministic.c:deterministic() {
dt_work *= 7;
dt_work *= 7;
dt_work *= 7;
function1();
dt_work *= 7;
dt_work *= 7;
...
On the 8th execution (the second multiplication after returning from
function1() to deterministic()), the value reaches 7,113,764,434, which
exceeds the maximum value for a 32-bit signed integer.
This causes a signed integer overflow regression. When the perf tool is
compiled with the undefined behavior sanitizer enabled, this workload will
trap or crash, breaking the test's strict deterministic execution behavior
and resulting in test failures.
Could dt_work be modified to an unsigned int to avoid this overflow?
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=6
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 01/19] perf cs-etm: Queue context packets for frontend
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
@ 2026-06-03 10:32 ` sashiko-bot
2026-06-03 20:14 ` Amir Ayupov
1 sibling, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:32 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Removing `prev_packet_thread` breaks branch sample cross-context instruction decoding when transitions cross machine boundaries (e.g., hypervisor to guest).
- [Medium] `cs_etm__synth_instruction_sample()` ignores its new `packet` argument and hardcodes `tidq->packet`, causing instruction copying to fail during trace flush.
--
commit 588f2ae34f926c74679fbf78b98515da3192e351
Author: James Clark <james.clark@linaro.org>
perf cs-etm: Queue context packets for frontend
This commit adds a CS_ETM_CONTEXT packet carrying the TID and exception
level to the frontend, keeping context changes ordered with the packet stream.
It separates the memory access functions for the frontend and decoder backend,
and removes tracking for the previous packet's thread.
Link: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 40c6ddfa8c8d9..ce570913669c1 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> @@ -1158,6 +1151,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
> return ret;
> }
>
> +static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> + u64 address, size_t size, u8 *buffer)
> +{
> + return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el,
> + tidq->frontend_thread);
> +}
[Severity: Medium]
Will this mix the previous packet's exception level with the newly updated
frontend thread?
When tracing virtualized environments, an exception return from a hypervisor
to a guest causes a machine context change. The decoder emits a context
packet which updates the frontend thread to the guest thread before
generating the branch sample for the hypervisor's execution range.
By passing the newly updated frontend thread to the memory access function
while using the previous packet's exception level, the map lookup could
look up the host kernel address in the guest's map group, causing the
lookup to fail and result in missing instruction bytes.
[ ... ]
> @@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
>
> static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> u64 addr, u64 period)
> {
> int ret = 0;
[ ... ]
> @@ -1604,7 +1640,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> sample.cpu = tidq->packet->cpu;
> sample.flags = tidq->prev_packet->flags;
> sample.cpumode = event->sample.header.misc;
>
> - cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> + cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
[Severity: Medium]
Does this ignore the newly added packet argument?
The function signature was updated to accept a packet argument so it can
process the previous packet during a flush. However, the function body
still hardcodes tidq->packet for sample.cpu and in the call to
cs_etm__copy_insn().
When called from a flush, tidq->packet is typically a discontinuity or
empty packet. This could cause cs_etm__copy_insn() to bail out early and
silently drop the instruction bytes for the flushed sample.
[ ... ]
> @@ -1649,15 +1685,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>
> event->sample.header.type = PERF_RECORD_SAMPLE;
> event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
> - tidq->prev_packet_el);
> + tidq->prev_packet->el);
> event->sample.header.size = sizeof(struct perf_event_header);
>
> /* Set time field based on etm auxtrace config. */
> sample.time = cs_etm__resolve_sample_time(etmq, tidq);
>
> sample.ip = ip;
> - sample.pid = thread__pid(tidq->prev_packet_thread);
> - sample.tid = thread__tid(tidq->prev_packet_thread);
> + sample.pid = thread__pid(tidq->frontend_thread);
> + sample.tid = thread__tid(tidq->frontend_thread);
[Severity: Medium]
Does the removal of the previous packet thread tracking break cross-context
branch samples?
By using the newly updated frontend thread instead of the previous packet
thread here, it appears a branch sample from the previous context could be
misattributed to the new thread when transitions cross machine boundaries.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=1
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore
2026-06-03 10:17 ` [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
@ 2026-06-03 10:36 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:36 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The arm-cs-trace-disasm.py script fails to use the explicitly provided kcore file for dynamically loaded kernel modules, silently falling back to the unpatched build-id ELF.
--
commit fb1d5bc3ba8d47081f18c5b9c0c07af6646b6389
Author: James Clark <james.clark@linaro.org>
perf test cs-etm: Make disassembly test use kcore
Hits in modules return empty disassembly with vmlinux as an input to objdump.
Make the disassembly test more reliable by always using kcore. And update the
comments to say that this is supported by the script.
> diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> index ba208c90d6311..8f6fa4a007b42 100755
> --- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
> +++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
> @@ -18,29 +18,29 @@ from perf_trace_context import perf_sample_srccode, perf_config_get
>
> # Below are some example commands for using this script.
> # Note a --kcore recording is required for accurate decode
> -# due to the alternatives patching mechanism. However this
> -# script only supports reading vmlinux for disassembly dump,
> -# meaning that any patched instructions will appear
> -# as unpatched, but the instruction ranges themselves will
> -# be correct. In addition to this, source line info comes
> -# from Perf, and when using kcore there is no debug info. The
> -# following lists the supported features in each mode:
> +# due to the alternatives patching mechanism. In addition to this,
> +# source line info comes from Perf, and when using kcore there is
> +# no debug info. The following lists the supported features in each mode:
> #
> # +-----------+-----------------+------------------+------------------+
> # | Recording | Accurate decode | Source line dump | Disassembly dump |
> # +-----------+-----------------+------------------+------------------+
> # | --kcore | yes | no | yes |
> -# | normal | no | yes | yes |
> +# | normal | no | yes (inaccurate) | yes (inaccurate) |
> # +-----------+-----------------+------------------+------------------+
[Severity: Medium]
While this commit advertises kcore support, does arm-cs-trace-disasm.py
actually use the kcore file for dynamically loaded kernel modules?
Looking at get_dso_file_path() in
tools/perf/scripts/python/arm-cs-trace-disasm.py, if a module (e.g., [ext4])
is encountered, it bypasses options.vmlinux completely and attempts to use
the unpatched ELF from the build-id directory:
if (dso_name == "[kernel.kallsyms]" or dso_name == "vmlinux"):
if (options.vmlinux):
return options.vmlinux;
else:
return find_vmlinux() if find_vmlinux() else dso_name
if (dso_name == "[vdso]") :
append = "/vdso"
else:
append = "/elf"
dso_path = os.environ['PERF_BUILDID_DIR'] + "/" + dso_name + "/" + dso_build_id + append;
Additionally, if get_dso_file_path() were changed to return the kcore file
for modules, would process_event() still fail?
process_event() sets dso_vm_start to the module's map start address:
if (dso == "[kernel.kallsyms]" or dso_start == 0x400000):
dso_vm_start = 0
map_pgoff = 0
else:
dso_vm_start = int(dso_start)
The subsequent read_disam() call then subtracts this from the sample address
to pass relative file offsets to objdump. This works for module ELFs but
breaks kcore, which requires absolute virtual addresses.
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=16
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads
2026-06-03 10:17 ` [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
@ 2026-06-03 10:41 ` sashiko-bot
0 siblings, 0 replies; 28+ messages in thread
From: sashiko-bot @ 2026-06-03 10:41 UTC (permalink / raw)
To: James Clark; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The function `cs_etm__synth_instruction_sample` takes a `packet` argument but ignores it in favor of `tidq->packet` for critical fields, leading to incorrect instruction synthesis.
- [Medium] Incorrect validation of the double pointer `thread` in `cs_etm__etmq_update_thread` breaks the fallback logic and creates a potential segmentation fault hazard.
--
commit b1b29e2b51869a73781c3576698bb30e8e7b0cb9
Author: James Clark <james.clark@linaro.org>
perf test cs-etm: Remove unused Coresight workloads
This commit removes unused CoreSight workloads from the perf tests. These
workloads had issues with out-of-source builds and slow compilation times.
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 40c6ddfa8c8d9..ce570913669c1 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1490,34 +1508,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
[ ... ]
> +/*
> + * Convert a raw thread number to a thread struct and assign it to **thread.
> + */
> +static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq,
> + ocsd_ex_level el, pid_t tid,
> + struct thread **thread)
> {
> struct machine *machine = cs_etm__get_machine(etmq, el);
>
> + if (!machine || !*thread)
> + return -EINVAL;
[Severity: Medium]
Will this early return prevent the fallback logic from executing?
If *thread is initially NULL (e.g. from an allocation failure), this check
returns -EINVAL immediately instead of reaching the fallback assignment
*thread = machine__idle_thread(machine) at the bottom of the function.
Additionally, checking !*thread before checking if the pointer thread
itself is valid could lead to a segmentation fault if thread is NULL. Should
this verify !thread instead, or should the !*thread check be removed to
allow the idle thread fallback?
> +
> if (tid != -1) {
> - thread__zput(tidq->thread);
> - tidq->thread = machine__find_thread(machine, -1, tid);
> + thread__zput(*thread);
> + *thread = machine__find_thread(machine, -1, tid);
> }
>
> /* Couldn't find a known thread */
> - if (!tidq->thread)
> - tidq->thread = machine__idle_thread(machine);
> + if (!*thread)
> + *thread = machine__idle_thread(machine);
>
> - tidq->el = el;
> + return 0;
> }
[ ... ]
> @@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
>
> static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> u64 addr, u64 period)
> {
> int ret = 0;
[ ... ]
> sample.flags = tidq->prev_packet->flags;
> sample.cpumode = event->sample.header.misc;
>
> - cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> + cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
[Severity: High]
Is the correct packet being used for instruction synthesis here?
The function signature was updated to take a struct cs_etm_packet *packet
argument, but this line explicitly references tidq->packet instead of the
passed packet. Since callers like cs_etm__flush() and cs_etm__end_block()
pass tidq->prev_packet as the argument, referencing tidq->packet here will
use stale or incorrect packet data, potentially corrupting synthesized trace
data.
>
> if (etm->synth_opts.last_branch)
> sample.branch_stack = tidq->last_branch;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260603-james-cs-context-tracking-fix-v3-0-c392945d9ed5@linaro.org?part=15
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v3 01/19] perf cs-etm: Queue context packets for frontend
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
2026-06-03 10:32 ` sashiko-bot
@ 2026-06-03 20:14 ` Amir Ayupov
1 sibling, 0 replies; 28+ messages in thread
From: Amir Ayupov @ 2026-06-03 20:14 UTC (permalink / raw)
To: James Clark
Cc: Suzuki K Poulose, Mike Leach, Leo Yan, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Jonathan Corbet, Shuah Khan,
Paschalis Mpeis, coresight, linux-perf-users, linux-kernel,
Arnaldo Carvalho de Melo, linux-doc
Hi James,
I tested the v2 patch series and it looks good. Compared to my
stamping pid/tid on each packet, there was a minor difference in 2/39
tested perf data files: the number of brstack samples differs by one,
however, there was no loss of binary profile. The resulting BOLT
profile converted from the perf script output was identical, so I'm OK
with v2 patch as-is.
-Amir
On Wed, Jun 3, 2026 at 3:17 AM James Clark <james.clark@linaro.org> wrote:
>
> >
> PE_CONTEXT elements update the context ID and exception level, but the
> decoder may still have prior packets cached for frontend processing.
> Updating the context immediately in the decoder backend can make those
> cached packets get consumed with the wrong thread or EL state.
>
> Add a CS_ETM_CONTEXT packet carrying the TID and EL to the frontend,
> this keeps context changes ordered with the rest of the packet stream
> and avoids mismatches when synthesizing samples from cached packets.
>
> Separate the memory access function into one for the frontend and one
> for decoding. The frontend also needs memory access to attach the
> instruction to samples. Because the frontend does memory access for
> both previous and current packets, change all the frontend memory access
> function signatures to take both a tidq and packet. But backend always
> uses the current backend EL and thread from the tidq.
>
> Treat context packets as a boundary for branch sample generation and
> remove tidq->prev_packet_thread because it's not possible to branch to a
> different thread, so only tracking the current thread is required for
> sample generation.
>
> Fixes: e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight traces")
> Reported-by: Amir Ayupov <aaupov@meta.com>
> Closes: https://lore.kernel.org/linux-perf-users/20260515021135.1729028-1-aaupov@meta.com/
> Co-authored-by: James Clark <james.clark@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@arm.com>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 ++-
> tools/perf/util/cs-etm.c | 234 ++++++++++++++----------
> tools/perf/util/cs-etm.h | 8 +-
> 3 files changed, 162 insertions(+), 101 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> index dee3020ceaa9..26940f1f1b0b 100644
> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> @@ -402,6 +402,8 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
> packet_queue->packet_buffer[et].flags = 0;
> packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
> packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
> + packet_queue->packet_buffer[et].el = ocsd_EL_unknown;
> + packet_queue->packet_buffer[et].tid = -1;
>
> if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
> return OCSD_RESP_WAIT;
> @@ -449,6 +451,7 @@ cs_etm_decoder__buffer_range(struct cs_etm_queue *etmq,
> packet->last_instr_type = elem->last_i_type;
> packet->last_instr_subtype = elem->last_i_subtype;
> packet->last_instr_cond = elem->last_instr_cond;
> + packet->el = elem->context.exception_level;
>
> if (elem->last_i_type == OCSD_INSTR_BR || elem->last_i_type == OCSD_INSTR_BR_INDIRECT)
> packet->last_instr_taken_branch = elem->last_instr_exec;
> @@ -525,7 +528,9 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
> const ocsd_generic_trace_elem *elem,
> const uint8_t trace_chan_id)
> {
> + struct cs_etm_packet *packet;
> pid_t tid = -1;
> + int ret;
>
> /*
> * Process the PE_CONTEXT packets if we have a valid contextID or VMID.
> @@ -546,12 +551,18 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
> break;
> }
>
> - if (cs_etm__etmq_set_tid_el(etmq, tid, trace_chan_id,
> - elem->context.exception_level))
> + if (cs_etm__etmq_update_decode_context(etmq, trace_chan_id,
> + elem->context.exception_level, tid))
> return OCSD_RESP_FATAL_SYS_ERR;
>
> - if (tid == -1)
> - return OCSD_RESP_CONT;
> + ret = cs_etm_decoder__buffer_packet(etmq, packet_queue, trace_chan_id,
> + CS_ETM_CONTEXT);
> + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> + return ret;
> +
> + packet = &packet_queue->packet_buffer[packet_queue->tail];
> + packet->tid = tid;
> + packet->el = elem->context.exception_level;
>
> /*
> * A timestamp is generated after a PE_CONTEXT element so make sure
> @@ -559,7 +570,7 @@ cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
> */
> cs_etm_decoder__reset_timestamp(packet_queue);
>
> - return OCSD_RESP_CONT;
> + return ret;
> }
>
> static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 40c6ddfa8c8d..ce570913669c 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -85,15 +85,22 @@ struct cs_etm_traceid_queue {
> u64 period_instructions;
> size_t last_branch_pos;
> union perf_event *event_buf;
> - struct thread *thread;
> - struct thread *prev_packet_thread;
> - ocsd_ex_level prev_packet_el;
> - ocsd_ex_level el;
> struct branch_stack *last_branch;
> struct branch_stack *last_branch_rb;
> struct cs_etm_packet *prev_packet;
> struct cs_etm_packet *packet;
> struct cs_etm_packet_queue packet_queue;
> +
> + struct thread *decode_thread;
> + ocsd_ex_level decode_el;
> +
> + /*
> + * The frontend accesses the EL from '[prev_]packet' because it needs
> + * previous EL for branch and current EL for instruction samples. It's
> + * not possible to change thread in a single branch sample so no need to
> + * store or access the thread through the packet.
> + */
> + struct thread *frontend_thread;
> };
>
> enum cs_etm_format {
> @@ -614,10 +621,11 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
>
> queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
> tidq->trace_chan_id = trace_chan_id;
> - tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
> - tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
> + tidq->decode_el = ocsd_EL_unknown;
> + tidq->frontend_thread = machine__findnew_thread(&etm->session->machines.host, -1,
> + queue->tid);
> + tidq->decode_thread = machine__findnew_thread(&etm->session->machines.host, -1,
> queue->tid);
> - tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
>
> tidq->packet = zalloc(sizeof(struct cs_etm_packet));
> if (!tidq->packet)
> @@ -750,21 +758,10 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
> /*
> * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> * the next incoming packet.
> - *
> - * Threads and exception levels are also tracked for both the
> - * previous and current packets. This is because the previous
> - * packet is used for the 'from' IP for branch samples, so the
> - * thread at that time must also be assigned to that sample.
> - * Across discontinuity packets the thread can change, so by
> - * tracking the thread for the previous packet the branch sample
> - * will have the correct info.
> */
> tmp = tidq->packet;
> tidq->packet = tidq->prev_packet;
> tidq->prev_packet = tmp;
> - tidq->prev_packet_el = tidq->el;
> - thread__put(tidq->prev_packet_thread);
> - tidq->prev_packet_thread = thread__get(tidq->thread);
> }
> }
>
> @@ -937,8 +934,8 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
>
> /* Free this traceid_queue from the array */
> tidq = etmq->traceid_queues[idx];
> - thread__zput(tidq->thread);
> - thread__zput(tidq->prev_packet_thread);
> + thread__zput(tidq->frontend_thread);
> + thread__zput(tidq->decode_thread);
> zfree(&tidq->event_buf);
> zfree(&tidq->last_branch);
> zfree(&tidq->last_branch_rb);
> @@ -1083,47 +1080,43 @@ static u8 cs_etm__cpu_mode(struct cs_etm_queue *etmq, u64 address,
> }
> }
>
> -static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
> - u64 address, size_t size, u8 *buffer,
> - const ocsd_mem_space_acc_t mem_space)
> +static u32 __cs_etm__mem_access(struct cs_etm_queue *etmq,
> + u64 address, size_t size, u8 *buffer,
> + const ocsd_mem_space_acc_t mem_space,
> + ocsd_ex_level el, struct thread *thread)
> {
> u8 cpumode;
> u64 offset;
> int len;
> struct addr_location al;
> struct dso *dso;
> - struct cs_etm_traceid_queue *tidq;
> int ret = 0;
>
> if (!etmq)
> return 0;
>
> addr_location__init(&al);
> - tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
> - if (!tidq)
> - goto out;
>
> /*
> - * We've already tracked EL along side the PID in cs_etm__set_thread()
> - * so double check that it matches what OpenCSD thinks as well. It
> - * doesn't distinguish between EL0 and EL1 for this mem access callback
> - * so we had to do the extra tracking. Skip validation if it's any of
> - * the 'any' values.
> + * We track EL for the frontend and the backend when receiving context
> + * and range packets. OpenCSD doesn't distinguish between EL0 and EL1
> + * for this mem access callback so we had to do the extra tracking. Skip
> + * validation if it's any of the 'any' values.
> */
> if (!(mem_space == OCSD_MEM_SPACE_ANY ||
> mem_space == OCSD_MEM_SPACE_N || mem_space == OCSD_MEM_SPACE_S)) {
> if (mem_space & OCSD_MEM_SPACE_EL1N) {
> /* Includes both non secure EL1 and EL0 */
> - assert(tidq->el == ocsd_EL1 || tidq->el == ocsd_EL0);
> + assert(el == ocsd_EL1 || el == ocsd_EL0);
> } else if (mem_space & OCSD_MEM_SPACE_EL2)
> - assert(tidq->el == ocsd_EL2);
> + assert(el == ocsd_EL2);
> else if (mem_space & OCSD_MEM_SPACE_EL3)
> - assert(tidq->el == ocsd_EL3);
> + assert(el == ocsd_EL3);
> }
>
> - cpumode = cs_etm__cpu_mode(etmq, address, tidq->el);
> + cpumode = cs_etm__cpu_mode(etmq, address, el);
>
> - if (!thread__find_map(tidq->thread, cpumode, address, &al))
> + if (!thread__find_map(thread, cpumode, address, &al))
> goto out;
>
> dso = map__dso(al.map);
> @@ -1138,7 +1131,7 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
>
> map__load(al.map);
>
> - len = dso__data_read_offset(dso, maps__machine(thread__maps(tidq->thread)),
> + len = dso__data_read_offset(dso, maps__machine(thread__maps(thread)),
> offset, buffer, size);
>
> if (len <= 0) {
> @@ -1158,6 +1151,30 @@ static u32 cs_etm__mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
> return ret;
> }
>
> +static u32 cs_etm__frontend_mem_access(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> + u64 address, size_t size, u8 *buffer)
> +{
> + return __cs_etm__mem_access(etmq, address, size, buffer, 0, packet->el,
> + tidq->frontend_thread);
> +}
> +
> +static u32 cs_etm__decoder_mem_access(struct cs_etm_queue *etmq, u8 trace_chan_id,
> + u64 address, size_t size, u8 *buffer,
> + const ocsd_mem_space_acc_t mem_space)
> +{
> + struct cs_etm_traceid_queue *tidq;
> +
> + tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
> + if (!tidq)
> + return 0;
> +
> + return __cs_etm__mem_access(etmq, address, size, buffer,
> + mem_space, tidq->decode_el,
> + tidq->decode_thread);
> +}
> +
> static struct cs_etm_queue *cs_etm__alloc_queue(void)
> {
> struct cs_etm_queue *etmq = zalloc(sizeof(*etmq));
> @@ -1333,12 +1350,13 @@ void cs_etm__reset_last_branch_rb(struct cs_etm_traceid_queue *tidq)
> }
>
> static inline int cs_etm__t32_instr_size(struct cs_etm_queue *etmq,
> - u8 trace_chan_id, u64 addr)
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet, u64 addr)
> {
> u8 instrBytes[2];
>
> - cs_etm__mem_access(etmq, trace_chan_id, addr, ARRAY_SIZE(instrBytes),
> - instrBytes, 0);
> + cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
> + ARRAY_SIZE(instrBytes), instrBytes);
> /*
> * T32 instruction size is indicated by bits[15:11] of the first
> * 16-bit word of the instruction: 0b11101, 0b11110 and 0b11111
> @@ -1371,16 +1389,16 @@ u64 cs_etm__last_executed_instr(const struct cs_etm_packet *packet)
> }
>
> static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
> - u64 trace_chan_id,
> - const struct cs_etm_packet *packet,
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> u64 offset)
> {
> if (packet->isa == CS_ETM_ISA_T32) {
> u64 addr = packet->start_addr;
>
> while (offset) {
> - addr += cs_etm__t32_instr_size(etmq,
> - trace_chan_id, addr);
> + addr += cs_etm__t32_instr_size(etmq, tidq, packet,
> + addr);
> offset--;
> }
> return addr;
> @@ -1490,34 +1508,51 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
> return etmq->buf_len;
> }
>
> -static void cs_etm__set_thread(struct cs_etm_queue *etmq,
> - struct cs_etm_traceid_queue *tidq, pid_t tid,
> - ocsd_ex_level el)
> +/*
> + * Convert a raw thread number to a thread struct and assign it to **thread.
> + */
> +static int cs_etm__etmq_update_thread(struct cs_etm_queue *etmq,
> + ocsd_ex_level el, pid_t tid,
> + struct thread **thread)
> {
> struct machine *machine = cs_etm__get_machine(etmq, el);
>
> + if (!machine || !*thread)
> + return -EINVAL;
> +
> if (tid != -1) {
> - thread__zput(tidq->thread);
> - tidq->thread = machine__find_thread(machine, -1, tid);
> + thread__zput(*thread);
> + *thread = machine__find_thread(machine, -1, tid);
> }
>
> /* Couldn't find a known thread */
> - if (!tidq->thread)
> - tidq->thread = machine__idle_thread(machine);
> + if (!*thread)
> + *thread = machine__idle_thread(machine);
>
> - tidq->el = el;
> + return 0;
> }
>
> -int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
> - u8 trace_chan_id, ocsd_ex_level el)
> +/*
> + * Set the thread and EL of the decode context which is ahead in time of the
> + * frontend context.
> + */
> +int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
> + u8 trace_chan_id,
> + ocsd_ex_level el, pid_t tid)
> {
> struct cs_etm_traceid_queue *tidq;
> + int ret;
>
> tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
> if (!tidq)
> return -EINVAL;
>
> - cs_etm__set_thread(etmq, tidq, tid, el);
> + ret = cs_etm__etmq_update_thread(etmq, el, tid,
> + &tidq->decode_thread);
> + if (ret)
> + return ret;
> +
> + tidq->decode_el = el;
> return 0;
> }
>
> @@ -1527,8 +1562,8 @@ bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq)
> }
>
> static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> - u64 trace_chan_id,
> - const struct cs_etm_packet *packet,
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> struct perf_sample *sample)
> {
> /*
> @@ -1545,14 +1580,14 @@ static void cs_etm__copy_insn(struct cs_etm_queue *etmq,
> * cs_etm__t32_instr_size().
> */
> if (packet->isa == CS_ETM_ISA_T32)
> - sample->insn_len = cs_etm__t32_instr_size(etmq, trace_chan_id,
> + sample->insn_len = cs_etm__t32_instr_size(etmq, tidq, packet,
> sample->ip);
> /* Otherwise, A64 and A32 instruction size are always 32-bit. */
> else
> sample->insn_len = 4;
>
> - cs_etm__mem_access(etmq, trace_chan_id, sample->ip, sample->insn_len,
> - (void *)sample->insn, 0);
> + cs_etm__frontend_mem_access(etmq, tidq, packet, sample->ip,
> + sample->insn_len, (void *)sample->insn);
> }
>
> u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp)
> @@ -1579,6 +1614,7 @@ static inline u64 cs_etm__resolve_sample_time(struct cs_etm_queue *etmq,
>
> static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet,
> u64 addr, u64 period)
> {
> int ret = 0;
> @@ -1588,15 +1624,15 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>
> perf_sample__init(&sample, /*all=*/true);
> event->sample.header.type = PERF_RECORD_SAMPLE;
> - event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
> + event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, packet->el);
> event->sample.header.size = sizeof(struct perf_event_header);
>
> /* Set time field based on etm auxtrace config. */
> sample.time = cs_etm__resolve_sample_time(etmq, tidq);
>
> sample.ip = addr;
> - sample.pid = thread__pid(tidq->thread);
> - sample.tid = thread__tid(tidq->thread);
> + sample.pid = thread__pid(tidq->frontend_thread);
> + sample.tid = thread__tid(tidq->frontend_thread);
> sample.id = etmq->etm->instructions_id;
> sample.stream_id = etmq->etm->instructions_id;
> sample.period = period;
> @@ -1604,7 +1640,7 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> sample.flags = tidq->prev_packet->flags;
> sample.cpumode = event->sample.header.misc;
>
> - cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> + cs_etm__copy_insn(etmq, tidq, tidq->packet, &sample);
>
> if (etm->synth_opts.last_branch)
> sample.branch_stack = tidq->last_branch;
> @@ -1649,15 +1685,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
>
> event->sample.header.type = PERF_RECORD_SAMPLE;
> event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
> - tidq->prev_packet_el);
> + tidq->prev_packet->el);
> event->sample.header.size = sizeof(struct perf_event_header);
>
> /* Set time field based on etm auxtrace config. */
> sample.time = cs_etm__resolve_sample_time(etmq, tidq);
>
> sample.ip = ip;
> - sample.pid = thread__pid(tidq->prev_packet_thread);
> - sample.tid = thread__tid(tidq->prev_packet_thread);
> + sample.pid = thread__pid(tidq->frontend_thread);
> + sample.tid = thread__tid(tidq->frontend_thread);
> sample.addr = cs_etm__first_executed_instr(tidq->packet);
> sample.id = etmq->etm->branches_id;
> sample.stream_id = etmq->etm->branches_id;
> @@ -1666,8 +1702,7 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
> sample.flags = tidq->prev_packet->flags;
> sample.cpumode = event->sample.header.misc;
>
> - cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet,
> - &sample);
> + cs_etm__copy_insn(etmq, tidq, tidq->prev_packet, &sample);
>
> /*
> * perf report cannot handle events without a branch stack
> @@ -1788,7 +1823,6 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> {
> struct cs_etm_auxtrace *etm = etmq->etm;
> int ret;
> - u8 trace_chan_id = tidq->trace_chan_id;
> u64 instrs_prev;
>
> /* Get instructions remainder from previous packet */
> @@ -1874,10 +1908,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> * been executed, but PC has not advanced to next
> * instruction)
> */
> - addr = cs_etm__instr_addr(etmq, trace_chan_id,
> - tidq->packet, offset - 1);
> + addr = cs_etm__instr_addr(etmq, tidq, tidq->packet,
> + offset - 1);
> ret = cs_etm__synth_instruction_sample(
> - etmq, tidq, addr,
> + etmq, tidq, tidq->packet, addr,
> etm->instructions_sample_period);
> if (ret)
> return ret;
> @@ -1959,7 +1993,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
> addr = cs_etm__last_executed_instr(tidq->prev_packet);
>
> err = cs_etm__synth_instruction_sample(
> - etmq, tidq, addr,
> + etmq, tidq, tidq->prev_packet, addr,
> tidq->period_instructions);
> if (err)
> return err;
> @@ -2014,7 +2048,7 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
> addr = cs_etm__last_executed_instr(tidq->prev_packet);
>
> err = cs_etm__synth_instruction_sample(
> - etmq, tidq, addr,
> + etmq, tidq, tidq->prev_packet, addr,
> tidq->period_instructions);
> if (err)
> return err;
> @@ -2051,9 +2085,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
> return etmq->buf_len;
> }
>
> -static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
> - struct cs_etm_packet *packet,
> - u64 end_addr)
> +static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq,
> + struct cs_etm_traceid_queue *tidq,
> + struct cs_etm_packet *packet, u64 end_addr)
> {
> /* Initialise to keep compiler happy */
> u16 instr16 = 0;
> @@ -2075,8 +2109,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
> * so below only read 2 bytes as instruction size for T32.
> */
> addr = end_addr - 2;
> - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr16),
> - (u8 *)&instr16, 0);
> + cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
> + sizeof(instr16), (u8 *)&instr16);
> if ((instr16 & 0xFF00) == 0xDF00)
> return true;
>
> @@ -2091,8 +2125,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
> * +---------+---------+-------------------------+
> */
> addr = end_addr - 4;
> - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
> - (u8 *)&instr32, 0);
> + cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
> + sizeof(instr32), (u8 *)&instr32);
> if ((instr32 & 0x0F000000) == 0x0F000000 &&
> (instr32 & 0xF0000000) != 0xF0000000)
> return true;
> @@ -2108,8 +2142,8 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
> * +-----------------------+---------+-----------+
> */
> addr = end_addr - 4;
> - cs_etm__mem_access(etmq, trace_chan_id, addr, sizeof(instr32),
> - (u8 *)&instr32, 0);
> + cs_etm__frontend_mem_access(etmq, tidq, packet, addr,
> + sizeof(instr32), (u8 *)&instr32);
> if ((instr32 & 0xFFE0001F) == 0xd4000001)
> return true;
>
> @@ -2125,7 +2159,6 @@ static bool cs_etm__is_svc_instr(struct cs_etm_queue *etmq, u8 trace_chan_id,
> static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
> struct cs_etm_traceid_queue *tidq, u64 magic)
> {
> - u8 trace_chan_id = tidq->trace_chan_id;
> struct cs_etm_packet *packet = tidq->packet;
> struct cs_etm_packet *prev_packet = tidq->prev_packet;
>
> @@ -2140,7 +2173,7 @@ static bool cs_etm__is_syscall(struct cs_etm_queue *etmq,
> */
> if (magic == __perf_cs_etmv4_magic) {
> if (packet->exception_number == CS_ETMV4_EXC_CALL &&
> - cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
> + cs_etm__is_svc_instr(etmq, tidq, prev_packet,
> prev_packet->end_addr))
> return true;
> }
> @@ -2178,7 +2211,6 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
> struct cs_etm_traceid_queue *tidq,
> u64 magic)
> {
> - u8 trace_chan_id = tidq->trace_chan_id;
> struct cs_etm_packet *packet = tidq->packet;
> struct cs_etm_packet *prev_packet = tidq->prev_packet;
>
> @@ -2204,7 +2236,7 @@ static bool cs_etm__is_sync_exception(struct cs_etm_queue *etmq,
> * (SMC, HVC) are taken as sync exceptions.
> */
> if (packet->exception_number == CS_ETMV4_EXC_CALL &&
> - !cs_etm__is_svc_instr(etmq, trace_chan_id, prev_packet,
> + !cs_etm__is_svc_instr(etmq, tidq, prev_packet,
> prev_packet->end_addr))
> return true;
>
> @@ -2228,7 +2260,6 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
> {
> struct cs_etm_packet *packet = tidq->packet;
> struct cs_etm_packet *prev_packet = tidq->prev_packet;
> - u8 trace_chan_id = tidq->trace_chan_id;
> u64 magic;
> int ret;
>
> @@ -2309,11 +2340,11 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
> if (prev_packet->flags == (PERF_IP_FLAG_BRANCH |
> PERF_IP_FLAG_RETURN |
> PERF_IP_FLAG_INTERRUPT) &&
> - cs_etm__is_svc_instr(etmq, trace_chan_id,
> - packet, packet->start_addr))
> + cs_etm__is_svc_instr(etmq, tidq, packet, packet->start_addr)) {
> prev_packet->flags = PERF_IP_FLAG_BRANCH |
> PERF_IP_FLAG_RETURN |
> PERF_IP_FLAG_SYSCALLRET;
> + }
> break;
> case CS_ETM_DISCONTINUITY:
> /*
> @@ -2394,6 +2425,7 @@ static int cs_etm__set_sample_flags(struct cs_etm_queue *etmq,
> PERF_IP_FLAG_RETURN |
> PERF_IP_FLAG_INTERRUPT;
> break;
> + case CS_ETM_CONTEXT:
> case CS_ETM_EMPTY:
> default:
> break;
> @@ -2469,6 +2501,19 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
> */
> cs_etm__sample(etmq, tidq);
> break;
> + case CS_ETM_CONTEXT:
> + /*
> + * Update context but don't swap packet. Keep the
> + * previous one for branch source address info, if
> + * tracing the kernel the context packet will be emitted
> + * between two ranges.
> + */
> + ret = cs_etm__etmq_update_thread(etmq, tidq->packet->el,
> + tidq->packet->tid,
> + &tidq->frontend_thread);
> + if (ret)
> + goto out;
> + break;
> case CS_ETM_EXCEPTION:
> case CS_ETM_EXCEPTION_RET:
> /*
> @@ -2497,6 +2542,7 @@ static int cs_etm__process_traceid_queue(struct cs_etm_queue *etmq,
> }
> }
>
> +out:
> return ret;
> }
>
> @@ -2620,7 +2666,7 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
> if (!tidq)
> continue;
>
> - if (tid == -1 || thread__tid(tidq->thread) == tid)
> + if (tid == -1 || thread__tid(tidq->frontend_thread) == tid)
> cs_etm__run_per_thread_timeless_decoder(etmq);
> } else
> cs_etm__run_per_cpu_timeless_decoder(etmq);
> @@ -3328,7 +3374,7 @@ static int cs_etm__create_queue_decoders(struct cs_etm_queue *etmq)
> */
> if (cs_etm_decoder__add_mem_access_cb(etmq->decoder,
> 0x0L, ((u64) -1L),
> - cs_etm__mem_access))
> + cs_etm__decoder_mem_access))
> goto out_free_decoder;
>
> zfree(&t_params);
> diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
> index aa9bb4a32eca..b81099c2b301 100644
> --- a/tools/perf/util/cs-etm.h
> +++ b/tools/perf/util/cs-etm.h
> @@ -158,6 +158,7 @@ enum cs_etm_sample_type {
> CS_ETM_DISCONTINUITY,
> CS_ETM_EXCEPTION,
> CS_ETM_EXCEPTION_RET,
> + CS_ETM_CONTEXT,
> };
>
> enum cs_etm_isa {
> @@ -184,6 +185,8 @@ struct cs_etm_packet {
> u8 last_instr_size;
> u8 trace_chan_id;
> int cpu;
> + int el;
> + pid_t tid;
> };
>
> #define CS_ETM_PACKET_MAX_BUFFER 1024
> @@ -259,8 +262,9 @@ enum cs_etm_pid_fmt {
> #include <opencsd/ocsd_if_types.h>
> int cs_etm__get_cpu(struct cs_etm_queue *etmq, u8 trace_chan_id, int *cpu);
> enum cs_etm_pid_fmt cs_etm__get_pid_fmt(struct cs_etm_queue *etmq);
> -int cs_etm__etmq_set_tid_el(struct cs_etm_queue *etmq, pid_t tid,
> - u8 trace_chan_id, ocsd_ex_level el);
> +int cs_etm__etmq_update_decode_context(struct cs_etm_queue *etmq,
> + u8 trace_chan_id, ocsd_ex_level el,
> + pid_t tid);
> bool cs_etm__etmq_is_timeless(struct cs_etm_queue *etmq);
> void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
> u8 trace_chan_id);
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2026-06-03 20:14 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 10:17 [PATCH v3 00/19] perf cs-etm: Queue context packets for frontend James Clark
2026-06-03 10:17 ` [PATCH v3 01/19] " James Clark
2026-06-03 10:32 ` sashiko-bot
2026-06-03 20:14 ` Amir Ayupov
2026-06-03 10:17 ` [PATCH v3 02/19] perf test: Add workload-ctl option James Clark
2026-06-03 10:17 ` [PATCH v3 03/19] perf test: Add a workload that forces context switches James Clark
2026-06-03 10:28 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 04/19] perf test cs-etm: Test process attribution James Clark
2026-06-03 10:17 ` [PATCH v3 05/19] perf test: Add deterministic workload James Clark
2026-06-03 10:24 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 06/19] perf test cs-etm: Replace unroll loop thread with deterministic decode test James Clark
2026-06-03 10:30 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 07/19] perf test cs-etm: Remove asm_pure_loop test James Clark
2026-06-03 10:17 ` [PATCH v3 08/19] perf test cs-etm: Replace memcpy test with raw dump stress test James Clark
2026-06-03 10:17 ` [PATCH v3 09/19] perf test: Add named_threads workload James Clark
2026-06-03 10:26 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 10/19] perf test cs-etm: Test decoding for concurrent threads test James Clark
2026-06-03 10:17 ` [PATCH v3 11/19] perf test cs-etm: Remove duplicate branch tests James Clark
2026-06-03 10:17 ` [PATCH v3 12/19] perf test cs-etm: Skip if not root James Clark
2026-06-03 10:17 ` [PATCH v3 13/19] perf test cs-etm: Reduce snapshot size James Clark
2026-06-03 10:17 ` [PATCH v3 14/19] perf test cs-etm: Speed up basic test James Clark
2026-06-03 10:17 ` [PATCH v3 15/19] perf test cs-etm: Remove unused Coresight workloads James Clark
2026-06-03 10:41 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 16/19] perf test cs-etm: Make disassembly test use kcore James Clark
2026-06-03 10:36 ` sashiko-bot
2026-06-03 10:17 ` [PATCH v3 17/19] perf test cs-etm: Add all branch instructions to test James Clark
2026-06-03 10:17 ` [PATCH v3 18/19] perf test cs-etm: Speed up disassembly test James Clark
2026-06-03 10:17 ` [PATCH v3 19/19] perf test cs-etm: Move existing tests to coresight folder James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox