From: Amir Ayupov <aaupov@meta.com>
To: Suzuki K Poulose <suzuki.poulose@arm.com>,
Mike Leach <mike.leach@arm.com>,
James Clark <james.clark@linaro.org>, Leo Yan <leo.yan@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
John Garry <john.g.garry@oracle.com>,
Will Deacon <will@kernel.org>, <coresight@lists.linaro.org>,
<linux-arm-kernel@lists.infradead.org>,
<linux-perf-users@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Subject: [PATCH] perf cs-etm: stamp pid/tid/EL on each buffered packet to fix cross-pid attribution
Date: Thu, 14 May 2026 19:11:34 -0700 [thread overview]
Message-ID: <20260515021135.1729028-1-aaupov@meta.com> (raw)
In a system-wide `perf record -e cs_etm/.../u` capture on aarch64,
synthesized samples emitted by `perf script --itrace=il64` are
sometimes attributed to the WRONG sample.pid/tid (and to the wrong
EL/cpumode) for the chunk of branches that straddle a context-switch
boundary on a CPU. A branch actually retired by process A is emitted
with sample.pid set to the thread that next ran on the same CPU.
Mechanism:
1. ETM emits CONTEXTIDR/EL packets in-stream when the kernel updates
CONTEXTIDR_EL1 on context switch / EL change. OpenCSD turns these
into OCSD_GEN_TRC_ELEM_PE_CONTEXT elements interleaved with
OCSD_GEN_TRC_ELEM_INSTR_RANGE elements for retired branch ranges.
2. cs_etm_decoder__buffer_range() queues each INSTR_RANGE into
packet_queue->packet_buffer[]; packets carry start/end addrs,
instr_count, last-instruction info, etc., but NO owner identity.
3. PE_CONTEXT goes through cs_etm_decoder__set_tid() ->
cs_etm__set_thread(), which immediately mutates tidq->thread and
tidq->el. Queued packets are not drained first; reset_timestamp()
is called so the next TIMESTAMP triggers OCSD_RESP_WAIT and a
drain.
4. By drain time in cs_etm__process_traceid_queue() ->
cs_etm__sample(), sample.pid/tid is read from the now-mutated
tidq->thread and sample.cpumode from the now-mutated tidq->el.
Pre-context INSTR_RANGEs get the post-context owner.
The same race affects branch samples via tidq->prev_packet_thread /
tidq->prev_packet_el, captured at packet-swap time from
tidq->thread / tidq->el (which may already have flipped).
This is independent of PERF_RECORD_SWITCH_CPU_WIDE, which is
deliberately not used to assign sample identity in this path. The
bug applies to any cs_etm capture with in-stream CONTEXTIDR
(PIDFMT_CTXTID or PIDFMT_CTXTID2).
Effect on downstream tools: branches that should belong to the
previous thread on the CPU get attributed to the next thread. When
the two threads share a binary, leaked branches' VAs land in the
wrong thread's mappings; samples whose IPs land in r-x mappings
silently pollute that binary's profile, while samples landing in
R-only/RW mappings show up as out-of-range / non-text samples.
Either way, AutoFDO/BOLT profiles built from `perf script --itrace`
output of system-wide cs_etm captures contain misattributed samples.
Concrete example from `perf script --itrace=il64` of the same
captured branch (same timestamp, same IP, same from/to addrs) before
and after this fix:
before: launcher_multia 2638146/2638146 705897.219172: \
fffcda6b124c 0xfffcda641958/0xfffcda6b123c
after: ws-tcf-sr-io13 2736581/2741587 705897.219172: \
fffcda6b124c 0xfffcda641958/0xfffcda6b123c
The branch was retired by ws-tcf-sr-io13 (tid 2741587) but, before
the fix, was attributed to launcher_multia (the next thread to run on
that CPU after the context switch). After the fix, it is correctly
attributed to ws-tcf-sr-io13.
Why not "drain on PE_CONTEXT then switch" (deferred-set_thread):
tidq->thread has two consumers \u2014 sample emission needs the OUTGOING
identity for queued packets, but cs_etm__mem_access() needs the
CURRENT thread's maps to fetch instruction bytes for OpenCSD. The
two needs are temporally inverted; a single tidq->thread cannot
serve both. Keeping tidq->thread current and stamping owner identity
per packet is the only design that decouples them cleanly.
Fix: capture the owning pid/tid/EL on each buffered packet at
cs_etm_decoder__buffer_packet() time (before any subsequent
PE_CONTEXT can mutate tidq->thread / tidq->el), and read them at
sample emission time.
- struct cs_etm_packet gains pid_t pid, pid_t tid, int el (storing
an ocsd_ex_level value; typed as int so the struct does not
depend on OpenCSD headers, which are only included inside
HAVE_CSTRACE_SUPPORT).
- cs_etm__etmq_get_pid_tid_el() (formerly cs_etm__etmq_get_pid_tid)
returns all three.
- cs_etm__synth_instruction_sample() reads sample.pid / sample.tid
from tidq->packet->{pid,tid} and derives sample.cpumode from
tidq->packet->el.
- cs_etm__synth_branch_sample() reads sample.pid / sample.tid /
cpumode from tidq->prev_packet->{pid,tid,el}.
- The separate prev_packet_thread / prev_packet_el bookkeeping in
cs_etm__packet_swap() / cs_etm__init_traceid_queue() /
cs_etm__free_traceid_queues() is removed; the per-packet stamp
on prev_packet now carries that information.
Cost: 12 bytes added to struct cs_etm_packet (~12-16 KB per
packet_queue with CS_ETM_PACKET_MAX_BUFFER=1024), 16 bytes saved per
cs_etm_traceid_queue (one struct thread * + one ocsd_ex_level).
A residual gap: cs_etm__copy_insn() reads sample.insn bytes via
cs_etm__mem_access(), which still uses tidq->thread (the current
thread), so the inline insn bytes for an outgoing-thread sample may
be looked up against the wrong address space. Fixing this requires
threading the packet's owner pid through cs_etm__mem_access and is
left for a follow-up. sample.ip / sample.pid attribution \u2014 what
AutoFDO/BOLT consume \u2014 is correct.
A workaround for users on existing perf binaries:
`perf record -p PIDS \u2026` programs the ETM
TRCCONTEXTIDCTLR/TRCVMIDCCTLR registers so the trace stream itself
never carries foreign-PID instructions, eliminating the leak at the
trace source.
Signed-off-by: Amir Ayupov <aaupov@meta.com>
Signed-off-by: Amir Ayupov <aaupov@fb.com>
---
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 12 +++
tools/perf/util/cs-etm.c | 75 +++++++++++++------
tools/perf/util/cs-etm.h | 25 +++++++
3 files changed, 88 insertions(+), 24 deletions(-)
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index dee3020ceaa91..ed99dfc7b0f8d 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -402,6 +402,18 @@ cs_etm_decoder__buffer_packet(struct cs_etm_queue *etmq,
packet_queue->packet_buffer[et].flags = 0;
packet_queue->packet_buffer[et].exception_number = UINT32_MAX;
packet_queue->packet_buffer[et].trace_chan_id = trace_chan_id;
+ /*
+ * Stamp the owner thread (pid/tid) onto the packet at buffer time.
+ * A subsequent OCSD_GEN_TRC_ELEM_PE_CONTEXT element will mutate
+ * tidq->thread before this packet is emitted as a sample; recording
+ * the identity here keeps each buffered packet correctly attributed
+ * to the thread that retired its instructions. See
+ * cs_etm__synth_instruction_sample().
+ */
+ cs_etm__etmq_get_pid_tid_el(etmq, trace_chan_id,
+ &packet_queue->packet_buffer[et].pid,
+ &packet_queue->packet_buffer[et].tid,
+ &packet_queue->packet_buffer[et].el);
if (packet_queue->packet_count == CS_ETM_PACKET_MAX_BUFFER - 1)
return OCSD_RESP_WAIT;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4c..ee3437488b753 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -86,8 +86,6 @@ struct cs_etm_traceid_queue {
size_t last_branch_pos;
union perf_event *event_buf;
struct thread *thread;
- struct thread *prev_packet_thread;
- ocsd_ex_level prev_packet_el;
ocsd_ex_level el;
struct branch_stack *last_branch;
struct branch_stack *last_branch_rb;
@@ -614,10 +612,9 @@ static int cs_etm__init_traceid_queue(struct cs_etm_queue *etmq,
queue = &etmq->etm->queues.queue_array[etmq->queue_nr];
tidq->trace_chan_id = trace_chan_id;
- tidq->el = tidq->prev_packet_el = ocsd_EL_unknown;
+ tidq->el = ocsd_EL_unknown;
tidq->thread = machine__findnew_thread(&etm->session->machines.host, -1,
queue->tid);
- tidq->prev_packet_thread = machine__idle_thread(&etm->session->machines.host);
tidq->packet = zalloc(sizeof(struct cs_etm_packet));
if (!tidq->packet)
@@ -740,6 +737,26 @@ struct cs_etm_packet_queue
return NULL;
}
+void cs_etm__etmq_get_pid_tid_el(struct cs_etm_queue *etmq, u8 trace_chan_id,
+ pid_t *pid, pid_t *tid, int *el)
+{
+ struct cs_etm_traceid_queue *tidq;
+
+ *pid = -1;
+ *tid = -1;
+ *el = ocsd_EL_unknown;
+
+ tidq = cs_etm__etmq_get_traceid_queue(etmq, trace_chan_id);
+ if (!tidq)
+ return;
+
+ *el = tidq->el;
+ if (tidq->thread) {
+ *pid = thread__pid(tidq->thread);
+ *tid = thread__tid(tidq->thread);
+ }
+}
+
static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
struct cs_etm_traceid_queue *tidq)
{
@@ -748,23 +765,15 @@ static void cs_etm__packet_swap(struct cs_etm_auxtrace *etm,
if (etm->synth_opts.branches || etm->synth_opts.last_branch ||
etm->synth_opts.instructions) {
/*
- * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
- * the next incoming packet.
- *
- * Threads and exception levels are also tracked for both the
- * previous and current packets. This is because the previous
- * packet is used for the 'from' IP for branch samples, so the
- * thread at that time must also be assigned to that sample.
- * Across discontinuity packets the thread can change, so by
- * tracking the thread for the previous packet the branch sample
- * will have the correct info.
+ * Rotate PACKET into PREV_PACKET so the next decoded packet
+ * lands in PACKET. Owner identity (pid/tid/el) travels with
+ * the packet itself — it was stamped at
+ * cs_etm_decoder__buffer_packet() time — so no separate
+ * thread/EL tracking is needed here.
*/
tmp = tidq->packet;
tidq->packet = tidq->prev_packet;
tidq->prev_packet = tmp;
- tidq->prev_packet_el = tidq->el;
- thread__put(tidq->prev_packet_thread);
- tidq->prev_packet_thread = thread__get(tidq->thread);
}
}
@@ -938,7 +947,6 @@ static void cs_etm__free_traceid_queues(struct cs_etm_queue *etmq)
/* Free this traceid_queue from the array */
tidq = etmq->traceid_queues[idx];
thread__zput(tidq->thread);
- thread__zput(tidq->prev_packet_thread);
zfree(&tidq->event_buf);
zfree(&tidq->last_branch);
zfree(&tidq->last_branch_rb);
@@ -1570,15 +1578,24 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
perf_sample__init(&sample, /*all=*/true);
event->sample.header.type = PERF_RECORD_SAMPLE;
- event->sample.header.misc = cs_etm__cpu_mode(etmq, addr, tidq->el);
+ event->sample.header.misc = cs_etm__cpu_mode(etmq, addr,
+ (ocsd_ex_level)tidq->packet->el);
event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */
sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = addr;
- sample.pid = thread__pid(tidq->thread);
- sample.tid = thread__tid(tidq->thread);
+ /*
+ * Read pid/tid (and EL above for cpumode) from the packet's
+ * stamped owner identity rather than tidq->thread / tidq->el,
+ * which reflect the thread that is current at sample emission
+ * time. A PE_CONTEXT element delivered between buffer time and
+ * emit time would otherwise misattribute pre-context packets to
+ * the next thread/EL on the CPU.
+ */
+ sample.pid = tidq->packet->pid;
+ sample.tid = tidq->packet->tid;
sample.id = etmq->etm->instructions_id;
sample.stream_id = etmq->etm->instructions_id;
sample.period = period;
@@ -1586,6 +1603,16 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
sample.flags = tidq->prev_packet->flags;
sample.cpumode = event->sample.header.misc;
+ /*
+ * Note: cs_etm__copy_insn() reads sample.insn bytes via
+ * cs_etm__mem_access(), which uses tidq->thread (the *current*
+ * thread). For samples whose pid/tid were stamped from an
+ * outgoing thread that has since been replaced by a PE_CONTEXT,
+ * the inline insn bytes may be looked up against the wrong
+ * address space. sample.ip / sample.pid attribution is correct;
+ * fixing the insn bytes requires threading the packet's owner
+ * pid through cs_etm__mem_access and is left for a follow-up.
+ */
cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
if (etm->synth_opts.last_branch)
@@ -1631,15 +1658,15 @@ static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
event->sample.header.type = PERF_RECORD_SAMPLE;
event->sample.header.misc = cs_etm__cpu_mode(etmq, ip,
- tidq->prev_packet_el);
+ (ocsd_ex_level)tidq->prev_packet->el);
event->sample.header.size = sizeof(struct perf_event_header);
/* Set time field based on etm auxtrace config. */
sample.time = cs_etm__resolve_sample_time(etmq, tidq);
sample.ip = ip;
- sample.pid = thread__pid(tidq->prev_packet_thread);
- sample.tid = thread__tid(tidq->prev_packet_thread);
+ sample.pid = tidq->prev_packet->pid;
+ sample.tid = tidq->prev_packet->tid;
sample.addr = cs_etm__first_executed_instr(tidq->packet);
sample.id = etmq->etm->branches_id;
sample.stream_id = etmq->etm->branches_id;
diff --git a/tools/perf/util/cs-etm.h b/tools/perf/util/cs-etm.h
index aa9bb4a32ecaf..6ba47604b8c52 100644
--- a/tools/perf/util/cs-etm.h
+++ b/tools/perf/util/cs-etm.h
@@ -10,6 +10,7 @@
#include "debug.h"
#include "util/event.h"
#include <linux/bits.h>
+#include <sys/types.h>
struct perf_session;
struct perf_pmu;
@@ -184,6 +185,20 @@ struct cs_etm_packet {
u8 last_instr_size;
u8 trace_chan_id;
int cpu;
+ /*
+ * Owner identity captured at cs_etm_decoder__buffer_packet() time.
+ * A subsequent PE_CONTEXT element will mutate tidq->thread/tidq->el
+ * before this packet is emitted as a sample; stamping pid/tid/el on
+ * the packet keeps each one attributed to the thread that actually
+ * retired its instructions and the EL it ran at. Read at sample
+ * emission time by cs_etm__synth_instruction_sample() and
+ * cs_etm__synth_branch_sample(). 'el' holds an ocsd_ex_level value
+ * but is typed as int so the struct does not depend on OpenCSD
+ * headers (which are only included inside HAVE_CSTRACE_SUPPORT).
+ */
+ pid_t pid;
+ pid_t tid;
+ int el;
};
#define CS_ETM_PACKET_MAX_BUFFER 1024
@@ -266,6 +281,16 @@ void cs_etm__etmq_set_traceid_queue_timestamp(struct cs_etm_queue *etmq,
u8 trace_chan_id);
struct cs_etm_packet_queue
*cs_etm__etmq_get_packet_queue(struct cs_etm_queue *etmq, u8 trace_chan_id);
+/*
+ * Read the pid/tid/EL currently associated with the given trace_chan_id.
+ * Called from cs_etm_decoder__buffer_packet() to stamp owner identity on
+ * each buffered packet at buffer time, before any subsequent PE_CONTEXT
+ * (CONTEXTIDR / EL change) can mutate tidq->thread / tidq->el. The stamp
+ * is consumed at sample emission time so that each sample is attributed
+ * to the thread/EL that actually retired its instructions.
+ */
+void cs_etm__etmq_get_pid_tid_el(struct cs_etm_queue *etmq, u8 trace_chan_id,
+ pid_t *pid, pid_t *tid, int *el);
int cs_etm__process_auxtrace_info_full(union perf_event *event __maybe_unused,
struct perf_session *session __maybe_unused);
u64 cs_etm__convert_sample_time(struct cs_etm_queue *etmq, u64 cs_timestamp);
--
2.52.0
reply other threads:[~2026-05-15 2:18 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515021135.1729028-1-aaupov@meta.com \
--to=aaupov@meta.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=coresight@lists.linaro.org \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=john.g.garry@oracle.com \
--cc=jolsa@kernel.org \
--cc=leo.yan@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mike.leach@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox