Linux Perf Users
 help / color / mirror / Atom feed
From: Amir Ayupov <aaupov@meta.com>
To: Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@arm.com>,
	James Clark <james.clark@linaro.org>, Leo Yan <leo.yan@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	John Garry <john.g.garry@oracle.com>,
	Will Deacon <will@kernel.org>, <coresight@lists.linaro.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-perf-users@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Subject: [PATCH] perf cs-etm: stamp pid/tid/EL on each buffered packet to fix cross-pid attribution
Date: Thu, 14 May 2026 19:11:34 -0700	[thread overview]
Message-ID: <20260515021135.1729028-1-aaupov@meta.com> (raw)

In a system-wide `perf record -e cs_etm/.../u` capture on aarch64,
synthesized samples emitted by `perf script --itrace=il64` are
sometimes attributed to the WRONG sample.pid/tid (and to the wrong
EL/cpumode) for the chunk of branches that straddle a context-switch
boundary on a CPU. A branch actually retired by process A is emitted
with sample.pid set to the thread that next ran on the same CPU.

Mechanism:
  1. ETM emits CONTEXTIDR/EL packets in-stream when the kernel updates
     CONTEXTIDR_EL1 on context switch / EL change. OpenCSD turns these
     into OCSD_GEN_TRC_ELEM_PE_CONTEXT elements interleaved with
     OCSD_GEN_TRC_ELEM_INSTR_RANGE elements for retired branch ranges.
  2. cs_etm_decoder__buffer_range() queues each INSTR_RANGE into
     packet_queue->packet_buffer[]; packets carry start/end addrs,
     instr_count, last-instruction info, etc., but NO owner identity.
  3. PE_CONTEXT goes through cs_etm_decoder__set_tid() ->
     cs_etm__set_thread(), which immediately mutates tidq->thread and
     tidq->el. Queued packets are not drained first; reset_timestamp()
     is called so the next TIMESTAMP triggers OCSD_RESP_WAIT and a
     drain.
  4. By drain time in cs_etm__process_traceid_queue() ->
     cs_etm__sample(), sample.pid/tid is read from the now-mutated
     tidq->thread and sample.cpumode from the now-mutated tidq->el.
     Pre-context INSTR_RANGEs get the post-context owner.

The same race affects branch samples via tidq->prev_packet_thread /
tidq->prev_packet_el, captured at packet-swap time from
tidq->thread / tidq->el (which may already have flipped).

This is independent of PERF_RECORD_SWITCH_CPU_WIDE, which is
deliberately not used to assign sample identity in this path. The
bug applies to any cs_etm capture with in-stream CONTEXTIDR
(PIDFMT_CTXTID or PIDFMT_CTXTID2).

Effect on downstream tools: branches that should belong to the
previous thread on the CPU get attributed to the next thread. When
the two threads share a binary, leaked branches' VAs land in the
wrong thread's mappings; samples whose IPs land in r-x mappings
silently pollute that binary's profile, while samples landing in
R-only/RW mappings show up as out-of-range / non-text samples.
Either way, AutoFDO/BOLT profiles built from `perf script --itrace`
output of system-wide cs_etm captures contain misattributed samples.

Concrete example from `perf script --itrace=il64` of the same
captured branch (same timestamp, same IP, same from/to addrs) before
and after this fix:

  before: launcher_multia 2638146/2638146 705897.219172: \
              fffcda6b124c 0xfffcda641958/0xfffcda6b123c
  after:  ws-tcf-sr-io13  2736581/2741587 705897.219172: \
              fffcda6b124c 0xfffcda641958/0xfffcda6b123c

The branch was retired by ws-tcf-sr-io13 (tid 2741587) but, before
the fix, was attributed to launcher_multia (the next thread to run on
that CPU after the context switch). After the fix, it is correctly
attributed to ws-tcf-sr-io13.

Why not "drain on PE_CONTEXT then switch" (deferred-set_thread):
tidq->thread has two consumers \u2014 sample emission needs the OUTGOING
identity for queued packets, but cs_etm__mem_access() needs the
CURRENT thread's maps to fetch instruction bytes for OpenCSD. The
two needs are temporally inverted; a single tidq->thread cannot
serve both. Keeping tidq->thread current and stamping owner identity
per packet is the only design that decouples them cleanly.

Fix: capture the owning pid/tid/EL on each buffered packet at
cs_etm_decoder__buffer_packet() time (before any subsequent
PE_CONTEXT can mutate tidq->thread / tidq->el), and read them at
sample emission time.

  - struct cs_etm_packet gains pid_t pid, pid_t tid, int el (storing
    an ocsd_ex_level value; typed as int so the struct does not
    depend on OpenCSD headers, which are only included inside
    HAVE_CSTRACE_SUPPORT).
  - cs_etm__etmq_get_pid_tid_el() (formerly cs_etm__etmq_get_pid_tid)
    returns all three.
  - cs_etm__synth_instruction_sample() reads sample.pid / sample.tid
    from tidq->packet->{pid,tid} and derives sample.cpumode from
    tidq->packet->el.
  - cs_etm__synth_branch_sample() reads sample.pid / sample.tid /
    cpumode from tidq->prev_packet->{pid,tid,el}.
  - The separate prev_packet_thread / prev_packet_el bookkeeping in
    cs_etm__packet_swap() / cs_etm__init_traceid_queue() /
    cs_etm__free_traceid_queues() is removed; the per-packet stamp
    on prev_packet now carries that information.

Cost: 12 bytes added to struct cs_etm_packet (~12-16 KB per
packet_queue with CS_ETM_PACKET_MAX_BUFFER=1024), 16 bytes saved per
cs_etm_traceid_queue (one struct thread * + one ocsd_ex_level).

A residual gap: cs_etm__copy_insn() reads sample.insn bytes via
cs_etm__mem_access(), which still uses tidq->thread (the current
thread), so the inline insn bytes for an outgoing-thread sample may
be looked up against the wrong address space. Fixing this requires
threading the packet's owner pid through cs_etm__mem_access and is
left for a follow-up. sample.ip / sample.pid attribution \u2014 what
AutoFDO/BOLT consume \u2014 is correct.

A workaround for users on existing perf binaries:
`perf record -p PIDS \u2026` programs the ETM
TRCCONTEXTIDCTLR/TRCVMIDCCTLR registers so the trace stream itself
never carries foreign-PID instructions, eliminating the leak at the
trace source.

Signed-off-by: Amir Ayupov <aaupov@meta.com>
Signed-off-by: Amir Ayupov <aaupov@fb.com>
---
 .../perf/util/cs-etm-decoder/cs-etm-decoder.c | 12 +++
 tools/perf/util/cs-etm.c                      | 75 +++++++++++++------
 tools/perf/util/cs-etm.h                      | 25 +++++++
 3 files changed, 88 insertions(+), 24 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index dee3020ceaa91..ed99dfc7b0f8d 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -402,6 +402,18 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
 	packet_queue->packet_buffer[et].flags = 0;
 	packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
 	packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
+	/*
+	 * Stamp the owner thread (pid/tid) onto the packet at buffer time.
+	 * A subsequent OCSD_GEN_TRC_ELEM_PE_CONTEXT element will mutate
+	 * tidq->thread before this packet is emitted as a sample; recording
+	 * the identity here keeps each buffered packet correctly attributed
+	 * to the thread that retired its instructions. See
+	 * cs_etm__synth_instruction_sample().
+	 */
+	cs_etm__etmq_get_pid_tid_el(etmq, trace_chan_id,
+				    &packet_queue->packet_buffer[et].pid,
+				    &packet_queue->packet_buffer[et].tid,
+				    &packet_queue->packet_buffer[et].el);
 
 	if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4c..ee3437488b753 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -86,8 +86,6 @@ struct cs_etm_traceid_queue {
 	size_t last_branch_pos;
 	union perf_event *event_buf;
 	struct thread *thread;
-	struct thread *prev_packet_thread;
-	ocsd_ex_level prev_packet_el;
 	ocsd_ex_level el;
 	struct branch_stack *last_branch;
 	struct branch_stack *last_branch_rb;
@@ -614,10 +612,9 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
 
 	queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
 	tidq->trace_chan_id = trace_chan_id;
-	tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
+	tidq->el = ocsd_EL_unknown;
 	tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
 					       queue->tid);
-	tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
 
 	tidq->packet = zalloc(sizeof(struct cs_etm_packet));
 	if (!tidq->packet)
@@ -740,6 +737,26 @@ struct cs_etm_packet_queue
 	return NULL;
 }
 
+void cs_etm__etmq_get_pid_tid_el(struct cs_etm_queue *etmq, u8 trace_chan_id,
+				 pid_t *pid, pid_t *tid, int *el)
+{
+	struct cs_etm_traceid_queue *tidq;
+
+	*pid = -1;
+	*tid = -1;
+	*el  = ocsd_EL_unknown;
+
+	tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
+	if (!tidq)
+		return;
+
+	*el = tidq->el;
+	if (tidq->thread) {
+		*pid = thread__pid(tidq->thread);
+		*tid = thread__tid(tidq->thread);
+	}
+}
+
 static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
 				struct cs_etm_traceid_queue *tidq)
 {
@@ -748,23 +765,15 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
 	if (etm->synth_opts.branches || etm->synth_opts.last_branch ||
 	    etm->synth_opts.instructions) {
 		/*
-		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
-		 * the next incoming packet.
-		 *
-		 * Threads and exception levels are also tracked for both the
-		 * previous and current packets. This is because the previous
-		 * packet is used for the 'from' IP for branch samples, so the
-		 * thread at that time must also be assigned to that sample.
-		 * Across discontinuity packets the thread can change, so by
-		 * tracking the thread for the previous packet the branch sample
-		 * will have the correct info.
+		 * Rotate PACKET into PREV_PACKET so the next decoded packet
+		 * lands in PACKET. Owner identity (pid/tid/el) travels with
+		 * the packet itself — it was stamped at
+		 * cs_etm_decoder__buffer_packet() time — so no separate
+		 * thread/EL tracking is needed here.
 		 */
 		tmp = tidq->packet;
 		tidq->packet = tidq->prev_packet;
 		tidq->prev_packet = tmp;
-		tidq->prev_packet_el = tidq->el;
-		thread__put(tidq->prev_packet_thread);
-		tidq->prev_packet_thread = thread__get(tidq->thread);
 	}
 }
 
@@ -938,7 +947,6 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
 		/* Free this traceid_queue from the array */
 		tidq = etmq->traceid_queues[idx];
 		thread__zput(tidq->thread);
-		thread__zput(tidq->prev_packet_thread);
 		zfree(&tidq->event_buf);
 		zfree(&tidq->last_branch);
 		zfree(&tidq->last_branch_rb);
@@ -1570,15 +1578,24 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	perf_sample__init(&sample, /*all=*/true);
 	event->sample.header.type = PERF_RECORD_SAMPLE;
-	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
+	event->sample.header.misc = cs_etm__cpu_mode(etmq, addr,
+						     (ocsd_ex_level)tidq->packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = addr;
-	sample.pid = thread__pid(tidq->thread);
-	sample.tid = thread__tid(tidq->thread);
+	/*
+	 * Read pid/tid (and EL above for cpumode) from the packet's
+	 * stamped owner identity rather than tidq->thread / tidq->el,
+	 * which reflect the thread that is current at sample emission
+	 * time. A PE_CONTEXT element delivered between buffer time and
+	 * emit time would otherwise misattribute pre-context packets to
+	 * the next thread/EL on the CPU.
+	 */
+	sample.pid = tidq->packet->pid;
+	sample.tid = tidq->packet->tid;
 	sample.id = etmq->etm->instructions_id;
 	sample.stream_id = etmq->etm->instructions_id;
 	sample.period = period;
@@ -1586,6 +1603,16 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
+	/*
+	 * Note: cs_etm__copy_insn() reads sample.insn bytes via
+	 * cs_etm__mem_access(), which uses tidq->thread (the *current*
+	 * thread). For samples whose pid/tid were stamped from an
+	 * outgoing thread that has since been replaced by a PE_CONTEXT,
+	 * the inline insn bytes may be looked up against the wrong
+	 * address space. sample.ip / sample.pid attribution is correct;
+	 * fixing the insn bytes requires threading the packet's owner
+	 * pid through cs_etm__mem_access and is left for a follow-up.
+	 */
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
 	if (etm->synth_opts.last_branch)
@@ -1631,15 +1658,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
-						     tidq->prev_packet_el);
+						     (ocsd_ex_level)tidq->prev_packet->el);
 	event->sample.header.size = sizeof(struct perf_event_header);
 
 	/* Set time field based on etm auxtrace config. */
 	sample.time = cs_etm__resolve_sample_time(etmq, tidq);
 
 	sample.ip = ip;
-	sample.pid = thread__pid(tidq->prev_packet_thread);
-	sample.tid = thread__tid(tidq->prev_packet_thread);
+	sample.pid = tidq->prev_packet->pid;
+	sample.tid = tidq->prev_packet->tid;
 	sample.addr = cs_etm__first_executed_instr(tidq->packet);
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index aa9bb4a32ecaf..6ba47604b8c52 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -10,6 +10,7 @@
 #include "debug.h"
 #include "util/event.h"
 #include <linux/bits.h>
+#include <sys/types.h>
 
 struct perf_session;
 struct perf_pmu;
@@ -184,6 +185,20 @@ struct cs_etm_packet {
 	u8 last_instr_size;
 	u8 trace_chan_id;
 	int cpu;
+	/*
+	 * Owner identity captured at cs_etm_decoder__buffer_packet() time.
+	 * A subsequent PE_CONTEXT element will mutate tidq->thread/tidq->el
+	 * before this packet is emitted as a sample; stamping pid/tid/el on
+	 * the packet keeps each one attributed to the thread that actually
+	 * retired its instructions and the EL it ran at. Read at sample
+	 * emission time by cs_etm__synth_instruction_sample() and
+	 * cs_etm__synth_branch_sample(). 'el' holds an ocsd_ex_level value
+	 * but is typed as int so the struct does not depend on OpenCSD
+	 * headers (which are only included inside HAVE_CSTRACE_SUPPORT).
+	 */
+	pid_t pid;
+	pid_t tid;
+	int el;
 };
 
 #define CS_ETM_PACKET_MAX_BUFFER 1024
@@ -266,6 +281,16 @@ void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
 					      u8 trace_chan_id);
 struct cs_etm_packet_queue
 *cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id);
+/*
+ * Read the pid/tid/EL currently associated with the given trace_chan_id.
+ * Called from cs_etm_decoder__buffer_packet() to stamp owner identity on
+ * each buffered packet at buffer time, before any subsequent PE_CONTEXT
+ * (CONTEXTIDR / EL change) can mutate tidq->thread / tidq->el. The stamp
+ * is consumed at sample emission time so that each sample is attributed
+ * to the thread/EL that actually retired its instructions.
+ */
+void cs_etm__etmq_get_pid_tid_el(struct cs_etm_queue *etmq, u8 trace_chan_id,
+				 pid_t *pid, pid_t *tid, int *el);
 int cs_etm__process_auxtrace_info_full(union perf_event *event __maybe_unused,
 				       struct perf_session *session __maybe_unused);
 u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp);
-- 
2.52.0


                 reply	other threads:[~2026-05-15  2:18 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515021135.1729028-1-aaupov@meta.com \
    --to=aaupov@meta.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=coresight@lists.linaro.org \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=john.g.garry@oracle.com \
    --cc=jolsa@kernel.org \
    --cc=leo.yan@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mike.leach@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox