* [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes
@ 2026-04-28 7:03 Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
Cc: Ian Rogers
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 ++--
tools/perf/builtin-inject.c | 83 ++++++++++++++++++++++++++++--
tools/perf/tests/dlfilter-test.c | 8 ++-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 7 ++-
tools/perf/util/cs-etm.c | 6 ++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 13 +++--
tools/perf/util/synthetic-events.c | 25 ++++++---
tools/perf/util/synthetic-events.h | 6 ++-
10 files changed, 135 insertions(+), 30 deletions(-)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks
2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
@ 2026-04-28 7:03 ` Ian Rogers
2026-04-28 23:19 ` Namhyung Kim
2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
Cc: Ian Rogers
Synthesizing branch stacks for Intel-PT highlighed an issue where
PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
perf_event_attr branch_sample_type. This caused an incorrect size
calculation.
Fix the writing of the nr and hw_idx values during sample event
synthesis.
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/bench/inject-buildid.c | 9 ++++++---
tools/perf/builtin-inject.c | 12 +++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++++++--
tools/perf/tests/sample-parsing.c | 5 +++--
tools/perf/util/arm-spe.c | 7 +++++--
tools/perf/util/cs-etm.c | 6 ++++--
tools/perf/util/intel-bts.c | 3 ++-
tools/perf/util/intel-pt.c | 7 +++++--
tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
tools/perf/util/synthetic-events.h | 6 ++++--
10 files changed, 62 insertions(+), 26 deletions(-)
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index aad572a78d7f..bfd2c5ec9488 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
event.header.type = PERF_RECORD_SAMPLE;
event.header.misc = PERF_RECORD_MISC_USER;
- event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
-
- perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+ event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ perf_event__synthesize_sample(&event, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
return writen(data->input_pipe[1], &event, event.header.size);
}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f174bc69cec4..0c51cb4250d1 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- perf_event__synthesize_sample(event_copy, sample_type,
- evsel->core.attr.read_format, sample);
+ ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+ if (ret) {
+ pr_err("Failed to synthesize sample\n");
+ return ret;
+ }
return perf_event__repipe_synth(tool, event_copy);
}
@@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
- evsel->core.attr.read_format, &sample_sw);
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, &sample_sw);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
perf_sample__exit(&sample_sw);
diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
index e63790c61d53..204663571943 100644
--- a/tools/perf/tests/dlfilter-test.c
+++ b/tools/perf/tests/dlfilter-test.c
@@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
event->header.type = PERF_RECORD_SAMPLE;
event->header.misc = PERF_RECORD_MISC_USER;
- event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
- err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
+ event->header.size = perf_event__sample_event_size(&sample, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ err = perf_event__synthesize_sample(event, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
if (err)
return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a7327c942ca2..55f0b73ca20e 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
sample.read.one.lost = 1;
}
- sz = perf_event__sample_event_size(&sample, sample_type, read_format);
+ sz = perf_event__sample_event_size(&sample, sample_type, read_format,
+ evsel.core.attr.branch_sample_type);
bufsz = sz + 4096; /* Add a bit for overrun checking */
event = malloc(bufsz);
if (!event) {
@@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
event->header.size = sz;
err = perf_event__synthesize_sample(event, sample_type, read_format,
- &sample);
+ evsel.core.attr.branch_sample_type, &sample);
if (err) {
pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
"perf_event__synthesize_sample", sample_type, err);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e5835042acdf..c4ed9f10e731 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.type = PERF_RECORD_SAMPLE;
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
static inline int
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4..1ebc1a6a5e75 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
static int cs_etm__inject_event(union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 382255393fb3..0b18ebd13f7c 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
event.sample.header.size = bts->branches_event_size;
ret = perf_event__synthesize_sample(&event,
bts->branches_sample_type,
- 0, &sample);
+ /*read_format=*/0, /*branch_sample_type=*/0,
+ &sample);
if (ret)
return ret;
}
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..5142983e3243 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
static int intel_pt_inject_event(union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..2461f25a4d7d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
return process(tool, (union perf_event *) &event, NULL, machine);
}
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
+ u64 branch_sample_type)
{
size_t sz, result = sizeof(struct perf_record_sample);
@@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
+ /* nr */
+ sz += sizeof(u64);
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
+ sz += sizeof(u64);
result += sz;
}
@@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
}
int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
- const struct perf_sample *sample)
+ u64 branch_sample_type, const struct perf_sample *sample)
{
__u64 *array;
size_t sz;
@@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
- memcpy(array, sample->branch_stack, sz);
+
+ *array++ = sample->branch_stack->nr;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
+ if (sample->no_hw_idx)
+ *array++ = 0;
+ else
+ *array++ = sample->branch_stack->hw_idx;
+ }
+
+ memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b0edad0c3100..8c7f49f9ccf5 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
+int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
+ u64 branch_sample_type, const struct perf_sample *sample);
int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
@@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
+ u64 read_format, u64 branch_sample_type);
int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
struct target *target, struct perf_thread_map *threads,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis
2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
@ 2026-04-28 7:03 ` Ian Rogers
2026-04-28 20:20 ` sashiko-bot
2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
Cc: Ian Rogers
When using "perf inject --itrace=L" to synthesize branch stacks from
AUX data, several issues caused failures with the generated file:
1. The synthesized samples were delivered without the
PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's
sample_type. Fixed by using sample_type | evsel->synth_sample_type
in intel_pt_deliver_synth_event.
2. The record layout was misaligned because of inconsistent handling
of PERF_SAMPLE_BRANCH_HW_INDEX. Fixed by explicitly writing nr and
hw_idx in perf_event__synthesize_sample.
3. Modifying evsel->core.attr.sample_type early in __cmd_inject caused
parse failures for subsequent records in the input file. Fixed by
moving this modification to just before writing the header.
4. perf_event__repipe_sample was narrowed to only synthesize samples
when branch stack injection was requested, and restored the use of
perf_inject__cut_auxtrace_sample as a fallback to preserve
functionality.
5. Potential Heap Overflow in perf_event__repipe_sample : Addressed by
adding a check that prints an error and returns -EFAULT if the
calculated event size exceeds PERF_SAMPLE_MAX_SIZE , as you
requested.
6. Header vs Payload Mismatch in __cmd_inject : Addressed by narrowing
the condition so that HEADER_BRANCH_STACK is only set in the file
header if add_last_branch was true.
7. NULL Pointer Dereference in intel-pt.c : Addressed by updating the
condition in intel_pt_do_synth_pebs_sample to fill sample.
branch_stack if it was synthesized, even if not in the original
sample_type .
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-inject.c | 87 ++++++++++++++++++++++++++++++++-----
tools/perf/util/intel-pt.c | 6 ++-
2 files changed, 81 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 0c51cb4250d1..1f4d25a0efba 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -223,6 +223,11 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
tool);
int ret;
+ if (inject->itrace_synth_opts.add_last_branch) {
+ event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+
ret = perf_event__process_attr(tool, event, pevlist);
if (ret)
return ret;
@@ -375,7 +380,60 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
- if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
+ if (inject->itrace_synth_opts.set &&
+ (inject->itrace_synth_opts.last_branch ||
+ inject->itrace_synth_opts.add_last_branch)) {
+ union perf_event *event_copy = (void *)inject->event_copy;
+ struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
+ int err;
+ size_t sz;
+ u64 orig_type = evsel->core.attr.sample_type;
+ u64 orig_branch_type = evsel->core.attr.branch_sample_type;
+
+ if (event_copy == NULL) {
+ inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!inject->event_copy)
+ return -ENOMEM;
+
+ event_copy = (void *)inject->event_copy;
+ }
+
+ if (!sample->branch_stack)
+ sample->branch_stack = &dummy_bs;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ /* Temporarily add in type bits for synthesis. */
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+
+ if (sz > PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+
+ event_copy->header.type = PERF_RECORD_SAMPLE;
+ event_copy->header.misc = event->header.misc;
+ event_copy->header.size = sz;
+
+ err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+
+ if (err) {
+ pr_err("Failed to synthesize sample\n");
+ return err;
+ }
+ event = event_copy;
+ } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
event = perf_inject__cut_auxtrace_sample(inject, event, sample);
if (IS_ERR(event))
return PTR_ERR(event);
@@ -463,13 +521,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
- evsel->core.attr.read_format,
- evsel->core.attr.branch_sample_type, sample);
- if (ret) {
- pr_err("Failed to synthesize sample\n");
- return ret;
- }
+ perf_event__synthesize_sample(event_copy, sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
return perf_event__repipe_synth(tool, event_copy);
}
@@ -2440,12 +2494,25 @@ static int __cmd_inject(struct perf_inject *inject)
* synthesized hardware events, so clear the feature flag.
*/
if (inject->itrace_synth_opts.set) {
+ struct evsel *evsel;
+
perf_header__clear_feat(&session->header,
HEADER_AUXTRACE);
- if (inject->itrace_synth_opts.last_branch ||
- inject->itrace_synth_opts.add_last_branch)
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+ }
+
+ if (inject->itrace_synth_opts.add_last_branch) {
perf_header__set_feat(&session->header,
HEADER_BRANCH_STACK);
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |=
+ PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ }
}
/*
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 5142983e3243..2dce6106c038 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1731,6 +1731,7 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
static int intel_pt_inject_event(union perf_event *event,
struct perf_sample *sample, u64 type)
{
+ event->header.type = PERF_RECORD_SAMPLE;
event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
/*branch_sample_type=*/0);
@@ -2489,7 +2490,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
intel_pt_add_xmm(intr_regs, pos, items, regs_mask);
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) {
if (items->mask[INTEL_PT_LBR_0_POS] ||
items->mask[INTEL_PT_LBR_1_POS] ||
items->mask[INTEL_PT_LBR_2_POS]) {
@@ -2560,7 +2561,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
sample.transaction = txn;
}
- ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type);
+ ret = intel_pt_deliver_synth_event(pt, event, &sample,
+ sample_type | evsel->synth_sample_type);
perf_sample__exit(&sample);
return ret;
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis
2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
@ 2026-04-28 20:20 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-28 20:20 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] Silent data corruption due to removed error check in perf_event__convert_sample_callchain().
- [High] Pipe mode attribute headers incorrectly advertise PERF_SAMPLE_AUX, breaking downstream parsers.
- [High] Out-of-bounds write and parser breakage in perf_event__repipe_attr().
- [High] Unadvertised `size` field left in payload by perf_inject__cut_auxtrace_sample() breaks parsing.
- [High] Off-by-one error in sample size overflow check causes infinite parsing loops.
- [High] Dangling pointer vulnerability via `sample->branch_stack` stack variable assignment.
- [High] `hw_idx` omission in PEBS synthesis causes header vs payload mismatch.
--
commit e339fa2584e0f4ca5e2c4e8eb68d763fd40fbc59
Author: Ian Rogers <irogers@google.com>
perf inject: Fix itrace branch stack synthesis
This patch fixes several issues with perf inject --itrace=L when synthesizing
branch stacks from AUX data. It adjusts the sample types and event layout,
and addresses parsing issues and potential heap overflows.
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
[ ... ]
> @@ -223,6 +223,11 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
> tool);
> int ret;
>
> + if (inject->itrace_synth_opts.add_last_branch) {
> + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
> + }
> +
Does modifying event->attr.attr.sample_type before calling
perf_event__process_attr() risk an out-of-bounds write if the event buffer
is truncated?
Also, this populates the session's internal evsel with
PERF_SAMPLE_BRANCH_STACK, which might cause perf_evsel__parse_sample() to
fail on subsequent raw records that lack this data.
For pipe mode, this function dynamically emits headers but doesn't clear
PERF_SAMPLE_AUX from the repiped event->attr. Since the payload has the AUX
data stripped, does this cause downstream parsers to misinterpret event
boundaries because the header advertises data that isn't there?
> ret = perf_event__process_attr(tool, event, pevlist);
[ ... ]
> @@ -375,7 +380,60 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
>
> build_id__mark_dso_hit(tool, event, sample, evsel, machine);
>
> - if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
> + if (inject->itrace_synth_opts.set &&
> + (inject->itrace_synth_opts.last_branch ||
> + inject->itrace_synth_opts.add_last_branch)) {
> + union perf_event *event_copy = (void *)inject->event_copy;
> + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
> + int err;
[ ... ]
> + if (!sample->branch_stack)
> + sample->branch_stack = &dummy_bs;
If PERF_FORMAT_GROUP is used, could this leave a dangling pointer?
If deliver_sample_group() reuses the exact same sample structure for
subsequent deliveries, sample->branch_stack will point to dummy_bs which
is allocated on the stack and goes out of scope when this function returns.
> +
> + if (inject->itrace_synth_opts.add_last_branch) {
> + /* Temporarily add in type bits for synthesis. */
> + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
> + }
> + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
> +
> + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
> + evsel->core.attr.read_format,
> + evsel->core.attr.branch_sample_type);
> +
> + if (sz > PERF_SAMPLE_MAX_SIZE) {
Does this allow sz to be exactly PERF_SAMPLE_MAX_SIZE (65536)?
If sz evaluates to 65536, assigning it to event_copy->header.size (a __u16)
would overflow and wrap to 0. Emitting an event with header.size = 0 can
cause downstream parsers to enter an infinite loop.
Should this check be >= PERF_SAMPLE_MAX_SIZE?
> + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
> + return -EFAULT;
> + }
[ ... ]
> + if (err) {
> + pr_err("Failed to synthesize sample\n");
> + return err;
> + }
> + event = event_copy;
> + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
> event = perf_inject__cut_auxtrace_sample(inject, event, sample);
This fallback leaves an 8-byte size field (set to 0) in the sample payload.
Since PERF_SAMPLE_AUX is now globally cleared from the evsel headers in
__cmd_inject(), will parsers misinterpret this 8-byte field as the next
field in the format (like DATA_SRC) and shift all subsequent fields?
> if (IS_ERR(event))
> return PTR_ERR(event);
[ ... ]
> @@ -463,13 +521,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
> /* remove sample_type {STACK,REGS}_USER for synthesize */
> sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
>
> - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
> - evsel->core.attr.read_format,
> - evsel->core.attr.branch_sample_type, sample);
> - if (ret) {
> - pr_err("Failed to synthesize sample\n");
> - return ret;
> - }
> + perf_event__synthesize_sample(event_copy, sample_type,
> + evsel->core.attr.read_format,
> + evsel->core.attr.branch_sample_type, sample);
> return perf_event__repipe_synth(tool, event_copy);
Is it safe to remove the error check here?
If synthesis fails and returns a negative error code, event_copy might
remain uninitialized. Unconditionally passing it to perf_event__repipe_synth()
could write malformed or uninitialized memory to the output stream.
> }
[ ... ]
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
[ ... ]
> @@ -1731,6 +1731,7 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
> static int intel_pt_inject_event(union perf_event *event,
> struct perf_sample *sample, u64 type)
> {
> + event->header.type = PERF_RECORD_SAMPLE;
> event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> /*branch_sample_type=*/0);
>
Since __cmd_inject() globally sets PERF_SAMPLE_BRANCH_HW_INDEX on all evsels
when add_last_branch is true, does passing branch_sample_type=0 here omit
the hw_idx field from synthesized PEBS samples?
Downstream parsers might misalign the payload if the file header advertises
its presence but the payload omits it.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260428070328.1880314-1-irogers@google.com?part=2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
@ 2026-04-28 23:19 ` Namhyung Kim
0 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2026-04-28 23:19 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
On Tue, Apr 28, 2026 at 12:03:27AM -0700, Ian Rogers wrote:
> Synthesizing branch stacks for Intel-PT highlighed an issue where
> PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
> perf_event_attr branch_sample_type. This caused an incorrect size
> calculation.
>
> Fix the writing of the nr and hw_idx values during sample event
> synthesis.
>
> Assisted-by: Gemini:gemini-3.1-pro-preview
> Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> tools/perf/bench/inject-buildid.c | 9 ++++++---
> tools/perf/builtin-inject.c | 12 +++++++++---
> tools/perf/tests/dlfilter-test.c | 8 ++++++--
> tools/perf/tests/sample-parsing.c | 5 +++--
> tools/perf/util/arm-spe.c | 7 +++++--
> tools/perf/util/cs-etm.c | 6 ++++--
> tools/perf/util/intel-bts.c | 3 ++-
> tools/perf/util/intel-pt.c | 7 +++++--
> tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
> tools/perf/util/synthetic-events.h | 6 ++++--
> 10 files changed, 62 insertions(+), 26 deletions(-)
>
> diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
> index aad572a78d7f..bfd2c5ec9488 100644
> --- a/tools/perf/bench/inject-buildid.c
> +++ b/tools/perf/bench/inject-buildid.c
> @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
>
> event.header.type = PERF_RECORD_SAMPLE;
> event.header.misc = PERF_RECORD_MISC_USER;
> - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
> -
> - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
> + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
> + /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + perf_event__synthesize_sample(&event, bench_sample_type,
> + /*read_format=*/0,
> + /*branch_sample_type=*/0, &sample);
>
> return writen(data->input_pipe[1], &event, event.header.size);
> }
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index f174bc69cec4..0c51cb4250d1 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
> /* remove sample_type {STACK,REGS}_USER for synthesize */
> sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
>
> - perf_event__synthesize_sample(event_copy, sample_type,
> - evsel->core.attr.read_format, sample);
> + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
> + evsel->core.attr.read_format,
> + evsel->core.attr.branch_sample_type, sample);
> + if (ret) {
> + pr_err("Failed to synthesize sample\n");
> + return ret;
> + }
> return perf_event__repipe_synth(tool, event_copy);
> }
>
> @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
> sample_sw.period = sample->period;
> sample_sw.time = sample->time;
> perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
> - evsel->core.attr.read_format, &sample_sw);
> + evsel->core.attr.read_format,
> + evsel->core.attr.branch_sample_type, &sample_sw);
> build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
> ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
> perf_sample__exit(&sample_sw);
> diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
> index e63790c61d53..204663571943 100644
> --- a/tools/perf/tests/dlfilter-test.c
> +++ b/tools/perf/tests/dlfilter-test.c
> @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
>
> event->header.type = PERF_RECORD_SAMPLE;
> event->header.misc = PERF_RECORD_MISC_USER;
> - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
> - err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
> + event->header.size = perf_event__sample_event_size(&sample, sample_type,
> + /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + err = perf_event__synthesize_sample(event, sample_type,
> + /*read_format=*/0,
> + /*branch_sample_type=*/0, &sample);
> if (err)
> return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
>
> diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
> index a7327c942ca2..55f0b73ca20e 100644
> --- a/tools/perf/tests/sample-parsing.c
> +++ b/tools/perf/tests/sample-parsing.c
> @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
> sample.read.one.lost = 1;
> }
>
> - sz = perf_event__sample_event_size(&sample, sample_type, read_format);
> + sz = perf_event__sample_event_size(&sample, sample_type, read_format,
> + evsel.core.attr.branch_sample_type);
> bufsz = sz + 4096; /* Add a bit for overrun checking */
> event = malloc(bufsz);
> if (!event) {
> @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
> event->header.size = sz;
>
> err = perf_event__synthesize_sample(event, sample_type, read_format,
> - &sample);
> + evsel.core.attr.branch_sample_type, &sample);
> if (err) {
> pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
> "perf_event__synthesize_sample", sample_type, err);
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index e5835042acdf..c4ed9f10e731 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
>
> static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
> {
> - event->header.size = perf_event__sample_event_size(sample, type, 0);
> - return perf_event__synthesize_sample(event, type, 0, sample);
> + event->header.type = PERF_RECORD_SAMPLE;
> + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + return perf_event__synthesize_sample(event, type, /*read_format=*/0,
> + /*branch_sample_type=*/0, sample);
> }
>
> static inline int
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8a639d2e51a4..1ebc1a6a5e75 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
> static int cs_etm__inject_event(union perf_event *event,
> struct perf_sample *sample, u64 type)
> {
> - event->header.size = perf_event__sample_event_size(sample, type, 0);
> - return perf_event__synthesize_sample(event, type, 0, sample);
> + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + return perf_event__synthesize_sample(event, type, /*read_format=*/0,
> + /*branch_sample_type=*/0, sample);
> }
>
>
> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> index 382255393fb3..0b18ebd13f7c 100644
> --- a/tools/perf/util/intel-bts.c
> +++ b/tools/perf/util/intel-bts.c
> @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
> event.sample.header.size = bts->branches_event_size;
> ret = perf_event__synthesize_sample(&event,
> bts->branches_sample_type,
> - 0, &sample);
> + /*read_format=*/0, /*branch_sample_type=*/0,
> + &sample);
> if (ret)
> return ret;
> }
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index fc9eec8b54b8..5142983e3243 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
> static int intel_pt_inject_event(union perf_event *event,
> struct perf_sample *sample, u64 type)
> {
> - event->header.size = perf_event__sample_event_size(sample, type, 0);
> - return perf_event__synthesize_sample(event, type, 0, sample);
> + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> + /*branch_sample_type=*/0);
> +
> + return perf_event__synthesize_sample(event, type, /*read_format=*/0,
> + /*branch_sample_type=*/0, sample);
> }
>
> static inline int intel_pt_opt_inject(struct intel_pt *pt,
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 85bee747f4cd..2461f25a4d7d 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
> return process(tool, (union perf_event *) &event, NULL, machine);
> }
>
> -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
> +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
> + u64 branch_sample_type)
> {
> size_t sz, result = sizeof(struct perf_record_sample);
>
> @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
>
> if (type & PERF_SAMPLE_BRANCH_STACK) {
> sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> - /* nr, hw_idx */
> - sz += 2 * sizeof(u64);
> + /* nr */
> + sz += sizeof(u64);
> + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
> + sz += sizeof(u64);
> result += sz;
> }
>
> @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
> }
>
> int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
> - const struct perf_sample *sample)
> + u64 branch_sample_type, const struct perf_sample *sample)
> {
> __u64 *array;
> size_t sz;
> @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
>
> if (type & PERF_SAMPLE_BRANCH_STACK) {
> sz = sample->branch_stack->nr * sizeof(struct branch_entry);
> - /* nr, hw_idx */
> - sz += 2 * sizeof(u64);
> - memcpy(array, sample->branch_stack, sz);
> +
> + *array++ = sample->branch_stack->nr;
> +
> + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
> + if (sample->no_hw_idx)
> + *array++ = 0;
> + else
> + *array++ = sample->branch_stack->hw_idx;
> + }
> +
> + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
> array = (void *)array + sz;
> }
>
> diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
> index b0edad0c3100..8c7f49f9ccf5 100644
> --- a/tools/perf/util/synthetic-events.h
> +++ b/tools/perf/util/synthetic-events.h
> @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
> int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
> int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
> int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
> -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
> +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
> + u64 branch_sample_type, const struct perf_sample *sample);
> int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
> int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
> int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
> @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
>
> int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
>
> -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
> +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
> + u64 read_format, u64 branch_sample_type);
>
> int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
> struct target *target, struct perf_thread_map *threads,
> --
> 2.54.0.545.g6539524ca2-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes
2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
@ 2026-04-29 18:11 ` Ian Rogers
2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
2 siblings, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw)
To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon
Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo,
peterz, ravi.bangoria, Ian Rogers
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1.
v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 ++-
tools/perf/builtin-inject.c | 107 ++++++++++++++++++++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 7 +-
tools/perf/util/cs-etm.c | 6 +-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 26 +++++--
tools/perf/util/synthetic-events.c | 25 +++++--
tools/perf/util/synthetic-events.h | 6 +-
10 files changed, 165 insertions(+), 37 deletions(-)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks
2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
@ 2026-04-29 18:11 ` Ian Rogers
2026-04-29 20:51 ` sashiko-bot
2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
1 sibling, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw)
To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon
Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo,
peterz, ravi.bangoria, Ian Rogers
Synthesizing branch stacks for Intel-PT highlighed an issue where
PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the
perf_event_attr branch_sample_type. This caused an incorrect size
calculation.
Fix the writing of the nr and hw_idx values during sample event
synthesis.
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/bench/inject-buildid.c | 9 ++++++---
tools/perf/builtin-inject.c | 12 +++++++++---
tools/perf/tests/dlfilter-test.c | 8 ++++++--
tools/perf/tests/sample-parsing.c | 5 +++--
tools/perf/util/arm-spe.c | 7 +++++--
tools/perf/util/cs-etm.c | 6 ++++--
tools/perf/util/intel-bts.c | 3 ++-
tools/perf/util/intel-pt.c | 7 +++++--
tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++-------
tools/perf/util/synthetic-events.h | 6 ++++--
10 files changed, 62 insertions(+), 26 deletions(-)
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index aad572a78d7f..bfd2c5ec9488 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso,
event.header.type = PERF_RECORD_SAMPLE;
event.header.misc = PERF_RECORD_MISC_USER;
- event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
-
- perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+ event.header.size = perf_event__sample_event_size(&sample, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ perf_event__synthesize_sample(&event, bench_sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
return writen(data->input_pipe[1], &event, event.header.size);
}
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index f174bc69cec4..0c51cb4250d1 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- perf_event__synthesize_sample(event_copy, sample_type,
- evsel->core.attr.read_format, sample);
+ ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+ if (ret) {
+ pr_err("Failed to synthesize sample\n");
+ return ret;
+ }
return perf_event__repipe_synth(tool, event_copy);
}
@@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool,
sample_sw.period = sample->period;
sample_sw.time = sample->time;
perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type,
- evsel->core.attr.read_format, &sample_sw);
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, &sample_sw);
build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine);
ret = perf_event__repipe(tool, event_sw, &sample_sw, machine);
perf_sample__exit(&sample_sw);
diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c
index e63790c61d53..204663571943 100644
--- a/tools/perf/tests/dlfilter-test.c
+++ b/tools/perf/tests/dlfilter-test.c
@@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid
event->header.type = PERF_RECORD_SAMPLE;
event->header.misc = PERF_RECORD_MISC_USER;
- event->header.size = perf_event__sample_event_size(&sample, sample_type, 0);
- err = perf_event__synthesize_sample(event, sample_type, 0, &sample);
+ event->header.size = perf_event__sample_event_size(&sample, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ err = perf_event__synthesize_sample(event, sample_type,
+ /*read_format=*/0,
+ /*branch_sample_type=*/0, &sample);
if (err)
return test_result("perf_event__synthesize_sample() failed", TEST_FAIL);
diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index a7327c942ca2..55f0b73ca20e 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
sample.read.one.lost = 1;
}
- sz = perf_event__sample_event_size(&sample, sample_type, read_format);
+ sz = perf_event__sample_event_size(&sample, sample_type, read_format,
+ evsel.core.attr.branch_sample_type);
bufsz = sz + 4096; /* Add a bit for overrun checking */
event = malloc(bufsz);
if (!event) {
@@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
event->header.size = sz;
err = perf_event__synthesize_sample(event, sample_type, read_format,
- &sample);
+ evsel.core.attr.branch_sample_type, &sample);
if (err) {
pr_debug("%s failed for sample_type %#"PRIx64", error %d\n",
"perf_event__synthesize_sample", sample_type, err);
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index e5835042acdf..c4ed9f10e731 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq)
static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.type = PERF_RECORD_SAMPLE;
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
static inline int
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8a639d2e51a4..1ebc1a6a5e75 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq,
static int cs_etm__inject_event(union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 382255393fb3..0b18ebd13f7c 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
event.sample.header.size = bts->branches_event_size;
ret = perf_event__synthesize_sample(&event,
bts->branches_sample_type,
- 0, &sample);
+ /*read_format=*/0, /*branch_sample_type=*/0,
+ &sample);
if (ret)
return ret;
}
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index fc9eec8b54b8..5142983e3243 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
static int intel_pt_inject_event(union perf_event *event,
struct perf_sample *sample, u64 type)
{
- event->header.size = perf_event__sample_event_size(sample, type, 0);
- return perf_event__synthesize_sample(event, type, 0, sample);
+ event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
+ /*branch_sample_type=*/0);
+
+ return perf_event__synthesize_sample(event, type, /*read_format=*/0,
+ /*branch_sample_type=*/0, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 85bee747f4cd..2461f25a4d7d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool,
return process(tool, (union perf_event *) &event, NULL, machine);
}
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format)
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format,
+ u64 branch_sample_type)
{
size_t sz, result = sizeof(struct perf_record_sample);
@@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
+ /* nr */
+ sz += sizeof(u64);
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)
+ sz += sizeof(u64);
result += sz;
}
@@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format,
}
int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
- const struct perf_sample *sample)
+ u64 branch_sample_type, const struct perf_sample *sample)
{
__u64 *array;
size_t sz;
@@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
if (type & PERF_SAMPLE_BRANCH_STACK) {
sz = sample->branch_stack->nr * sizeof(struct branch_entry);
- /* nr, hw_idx */
- sz += 2 * sizeof(u64);
- memcpy(array, sample->branch_stack, sz);
+
+ *array++ = sample->branch_stack->nr;
+
+ if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) {
+ if (sample->no_hw_idx)
+ *array++ = 0;
+ else
+ *array++ = sample->branch_stack->hw_idx;
+ }
+
+ memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h
index b0edad0c3100..8c7f49f9ccf5 100644
--- a/tools/perf/util/synthetic-events.h
+++ b/tools/perf/util/synthetic-events.h
@@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_
int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine);
-int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample);
+int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format,
+ u64 branch_sample_type, const struct perf_sample *sample);
int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine);
int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs);
int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine);
@@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session,
int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process);
-size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format);
+size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
+ u64 read_format, u64 branch_sample_type);
int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool,
struct target *target, struct perf_thread_map *threads,
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis
2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
@ 2026-04-29 18:11 ` Ian Rogers
2026-04-29 21:18 ` sashiko-bot
1 sibling, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw)
To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon
Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo,
peterz, ravi.bangoria, Ian Rogers
When using "perf inject --itrace=L" to synthesize branch stacks from
AUX data, several issues caused failures with the generated file:
1. The synthesized samples were delivered without the
PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's
sample_type. Fixed by using sample_type | evsel->synth_sample_type
in intel_pt_do_synth_pebs_sample.
2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused
parse failures for subsequent records in the input file. Fixed by
moving this modification to just before writing the header.
3. perf_event__repipe_sample was narrowed to only synthesize samples
when branch stack injection was requested, and restored the use of
perf_inject__cut_auxtrace_sample as a fallback to preserve
functionality.
4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by
adding a check that prints an error and returns -EFAULT if the
calculated event size exceeds PERF_SAMPLE_MAX_SIZE.
5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing
the condition so that HEADER_BRANCH_STACK is only set in the file
header if add_last_branch was true.
6. NULL Pointer Dereference in intel-pt.c: Addressed by updating the
condition in intel_pt_do_synth_pebs_sample to fill sample.
branch_stack if it was synthesized, even if not in the original
sample_type.
7. Modifying event attributes in perf_event__repipe_attr early caused
downstream parser breakage in pipe mode. Fixed by clearing
PERF_SAMPLE_AUX when itrace_synth_opts.set is true to ensure the
headers match the stripped payload. Also added a size check against
struct perf_record_header_attr to prevent out-of-bounds writes.
8. Potential dangling pointer vulnerability in perf_event__repipe_sample:
Addressed by restoring the original sample->branch_stack pointer
before returning.
9. Off-by-one error in sample size check in perf_event__repipe_sample:
Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >.
10. Unadvertised size field left in payload by cut_auxtrace_sample:
Addressed by excluding the 8-byte size field from the copied
payload to correctly match the cleared PERF_SAMPLE_AUX bit.
11. Silent data corruption in convert_sample_callchain: Addressed by
restoring the error check for perf_event__synthesize_sample to
prevent writing malformed or uninitialized memory.
12. Omission of hw_idx in PEBS synthesis: Fixed by passing the correct
evsel->core.attr.branch_sample_type instead of 0 in intel-pt.c.
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
v2:
- Fix error handling in callchain conversion by keeping the error check.
- Fix header advertise issues in pipe mode by clearing PERF_SAMPLE_AUX.
- Prevent out-of-bounds writes in attribute repiping with size checks.
- Fix sample payload misalignment by cutting the unadvertised AUX size field.
- Fix off-by-one check in sample size overflow to correctly use >=.
- Avoid dangling pointer vulnerability by restoring the branch_stack pointer.
- Pass evsel's branch_sample_type to synthesized samples to prevent hw_idx omission.
---
tools/perf/builtin-inject.c | 97 ++++++++++++++++++++++++++++++++++---
tools/perf/util/intel-pt.c | 23 ++++++---
2 files changed, 106 insertions(+), 14 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 0c51cb4250d1..13fb1f05ceb2 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -223,6 +223,19 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
tool);
int ret;
+ if (event->header.size < sizeof(struct perf_record_header_attr)) {
+ pr_err("Attribute event size %u is too small\n", event->header.size);
+ return -EINVAL;
+ }
+
+ if (inject->itrace_synth_opts.set)
+ event->attr.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+
ret = perf_event__process_attr(tool, event, pevlist);
if (ret)
return ret;
@@ -330,8 +343,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
union perf_event *event,
struct perf_sample *sample)
{
- size_t sz1 = sample->aux_sample.data - (void *)event;
- size_t sz2 = event->header.size - sample->aux_sample.size - sz1;
+ size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64);
+ size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64));
union perf_event *ev;
if (inject->event_copy == NULL) {
@@ -342,13 +355,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject,
ev = (union perf_event *)inject->event_copy;
if (sz1 > event->header.size || sz2 > event->header.size ||
sz1 + sz2 > event->header.size ||
- sz1 < sizeof(struct perf_event_header) + sizeof(u64))
+ sz1 < sizeof(struct perf_event_header))
return event;
memcpy(ev, event, sz1);
memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2);
ev->header.size = sz1 + sz2;
- ((u64 *)((void *)ev + sz1))[-1] = 0;
return ev;
}
@@ -375,7 +387,63 @@ static int perf_event__repipe_sample(const struct perf_tool *tool,
build_id__mark_dso_hit(tool, event, sample, evsel, machine);
- if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
+ if (inject->itrace_synth_opts.set &&
+ (inject->itrace_synth_opts.last_branch ||
+ inject->itrace_synth_opts.add_last_branch)) {
+ union perf_event *event_copy = (void *)inject->event_copy;
+ struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 };
+ int err;
+ size_t sz;
+ u64 orig_type = evsel->core.attr.sample_type;
+ u64 orig_branch_type = evsel->core.attr.branch_sample_type;
+
+ struct branch_stack *orig_bs = sample->branch_stack;
+
+ if (event_copy == NULL) {
+ inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!inject->event_copy)
+ return -ENOMEM;
+
+ event_copy = (void *)inject->event_copy;
+ }
+
+ if (!sample->branch_stack)
+ sample->branch_stack = &dummy_bs;
+
+ if (inject->itrace_synth_opts.add_last_branch) {
+ /* Temporarily add in type bits for synthesis. */
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+
+ sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type);
+
+ if (sz >= PERF_SAMPLE_MAX_SIZE) {
+ pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE);
+ return -EFAULT;
+ }
+
+ event_copy->header.type = PERF_RECORD_SAMPLE;
+ event_copy->header.misc = event->header.misc;
+ event_copy->header.size = sz;
+
+ err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ evsel->core.attr.read_format,
+ evsel->core.attr.branch_sample_type, sample);
+
+ evsel->core.attr.sample_type = orig_type;
+ evsel->core.attr.branch_sample_type = orig_branch_type;
+ sample->branch_stack = orig_bs;
+
+ if (err) {
+ pr_err("Failed to synthesize sample\n");
+ return err;
+ }
+ event = event_copy;
+ } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) {
event = perf_inject__cut_auxtrace_sample(inject, event, sample);
if (IS_ERR(event))
return PTR_ERR(event);
@@ -463,7 +531,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
/* remove sample_type {STACK,REGS}_USER for synthesize */
sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
- ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
+ ret = perf_event__synthesize_sample(event_copy, sample_type,
evsel->core.attr.read_format,
evsel->core.attr.branch_sample_type, sample);
if (ret) {
@@ -2440,12 +2508,25 @@ static int __cmd_inject(struct perf_inject *inject)
* synthesized hardware events, so clear the feature flag.
*/
if (inject->itrace_synth_opts.set) {
+ struct evsel *evsel;
+
perf_header__clear_feat(&session->header,
HEADER_AUXTRACE);
- if (inject->itrace_synth_opts.last_branch ||
- inject->itrace_synth_opts.add_last_branch)
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX;
+ }
+
+ if (inject->itrace_synth_opts.add_last_branch) {
perf_header__set_feat(&session->header,
HEADER_BRANCH_STACK);
+
+ evlist__for_each_entry(session->evlist, evsel) {
+ evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+ evsel->core.attr.branch_sample_type |=
+ PERF_SAMPLE_BRANCH_HW_INDEX;
+ }
+ }
}
/*
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 5142983e3243..f45dbdd4d323 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1728,14 +1728,24 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt,
event->sample.header.misc = sample->cpumode;
}
-static int intel_pt_inject_event(union perf_event *event,
+static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event,
struct perf_sample *sample, u64 type)
{
+ struct evsel *evsel = NULL;
+ u64 branch_sample_type = 0;
+
+ if (pt->session && pt->session->evlist)
+ evsel = evlist__id2evsel(pt->session->evlist, sample->id);
+
+ if (evsel)
+ branch_sample_type = evsel->core.attr.branch_sample_type;
+
+ event->header.type = PERF_RECORD_SAMPLE;
event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
- /*branch_sample_type=*/0);
+ branch_sample_type);
return perf_event__synthesize_sample(event, type, /*read_format=*/0,
- /*branch_sample_type=*/0, sample);
+ branch_sample_type, sample);
}
static inline int intel_pt_opt_inject(struct intel_pt *pt,
@@ -1745,7 +1755,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt,
if (!pt->synth_opts.inject)
return 0;
- return intel_pt_inject_event(event, sample, type);
+ return intel_pt_inject_event(pt, event, sample, type);
}
static int intel_pt_deliver_synth_event(struct intel_pt *pt,
@@ -2489,7 +2499,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
intel_pt_add_xmm(intr_regs, pos, items, regs_mask);
}
- if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+ if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) {
if (items->mask[INTEL_PT_LBR_0_POS] ||
items->mask[INTEL_PT_LBR_1_POS] ||
items->mask[INTEL_PT_LBR_2_POS]) {
@@ -2560,7 +2570,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse
sample.transaction = txn;
}
- ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type);
+ ret = intel_pt_deliver_synth_event(pt, event, &sample,
+ sample_type | evsel->synth_sample_type);
perf_sample__exit(&sample);
return ret;
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks
2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
@ 2026-04-29 20:51 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-29 20:51 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Synthesized branch stack samples in hardware tracers (arm-spe, cs-etm, intel-pt) omit the hw_idx field despite the event attribute claiming its presence, causing parsers to read misaligned data.
- [High] In perf_event__convert_sample_callchain(), passing the unmasked evsel->core.attr.sample_type causes the generated payload to serialize stack and register data out of sync with the reduced header.size, leading to data stream corruption.
--
commit eb27565513c2df3df4959ab5f50b38abd090fa86
Author: Ian Rogers <irogers@google.com>
perf event: Fix size of synthesized sample with branch stacks
This commit updates sample event synthesis to correctly size and write the
nr and hw_idx values for branch stacks by explicitly checking the
PERF_SAMPLE_BRANCH_HW_INDEX flag.
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
[ ... ]
> @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
> /* remove sample_type {STACK,REGS}_USER for synthesize */
> sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
>
> - perf_event__synthesize_sample(event_copy, sample_type,
> - evsel->core.attr.read_format, sample);
> + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
> + evsel->core.attr.read_format,
> + evsel->core.attr.branch_sample_type, sample);
I noticed this is fixed later in the patch series by commit "perf inject:
Fix itrace branch stack synthesis".
Does passing the unmasked evsel->core.attr.sample_type here instead of the
local masked sample_type variable cause the generated payload to serialize
stack and register data out of sync with the reduced header size?
[ ... ]
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
[ ... ]
> static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type)
> {
> - event->header.size = perf_event__sample_event_size(sample, type, 0);
> - return perf_event__synthesize_sample(event, type, 0, sample);
> + event->header.type = PERF_RECORD_SAMPLE;
> + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + return perf_event__synthesize_sample(event, type, /*read_format=*/0,
> + /*branch_sample_type=*/0, sample);
> }
I see this is fixed for intel-pt.c later in the series by commit
"perf inject: Fix itrace branch stack synthesis", but it appears it remains
unfixed for arm-spe.c and cs-etm.c.
If branch_sample_type is hardcoded to 0 here, will it omit the hw_idx field
even if the synthesized event attribute claims it is present?
If so, will parsers reading the payload misinterpret the first 8 bytes of
the branch entries as the hw_idx field, shifting subsequent sample fields?
[ ... ]
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
[ ... ]
> static int cs_etm__inject_event(union perf_event *event,
> struct perf_sample *sample, u64 type)
> {
> - event->header.size = perf_event__sample_event_size(sample, type, 0);
> - return perf_event__synthesize_sample(event, type, 0, sample);
> + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0,
> + /*branch_sample_type=*/0);
> + return perf_event__synthesize_sample(event, type, /*read_format=*/0,
> + /*branch_sample_type=*/0, sample);
> }
Does this function suffer from the same issue, where hardcoding
branch_sample_type to 0 omits the hw_idx field and causes misaligned reads
during parsing?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260429181136.2712655-1-irogers@google.com?part=1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis
2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
@ 2026-04-29 21:18 ` sashiko-bot
0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-29 21:18 UTC (permalink / raw)
To: Ian Rogers; +Cc: linux-perf-users
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Modifying `event->attr.attr` in-place in `perf_event__repipe_attr` causes SIGSEGV and completely breaks sample parsing.
- [High] The size check in `perf_event__repipe_attr` incorrectly rejects valid older files and causes out-of-bounds writes.
- [High] Flawed `sample_size` recalculation in `perf_event__convert_sample_callchain` causes heap memory leaks and malformed output.
--
commit 64e2d17135bd2355121cf62d57be79321e910c46
Author: Ian Rogers <irogers@google.com>
perf inject: Fix itrace branch stack synthesis
This commit addresses several issues with synthesizing branch stacks from
AUX data when using "perf inject --itrace=L". It modifies sample types early
to handle missing branch stack flags and fixes sample synthesis logic.
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 0c51cb4250d1..13fb1f05ceb2 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -223,6 +223,19 @@ static int perf_event__repipe_attr(const struct perf_tool *tool,
> tool);
> int ret;
>
> + if (event->header.size < sizeof(struct perf_record_header_attr)) {
Does this size check incorrectly reject valid older files?
Older perf.data files have a smaller attr.size (e.g., 64 bytes). Valid
events from older kernels may have a total size of 72 bytes, triggering
this check and causing perf inject to wrongly reject them.
If an older event has enough trailing IDs to pass the size check, the struct
ends at offset 64 and the id[] array begins at offset 72. Modifying
event->attr.attr.branch_sample_type writes to offset 80, which overlaps with
the id[] array and can corrupt the event ID data.
> + pr_err("Attribute event size %u is too small\n", event->header.size);
> + return -EINVAL;
> + }
> +
> + if (inject->itrace_synth_opts.set)
> + event->attr.attr.sample_type &= ~PERF_SAMPLE_AUX;
> +
> + if (inject->itrace_synth_opts.add_last_branch) {
> + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
> + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX;
> + }
Could modifying event->attr.attr in-place here cause a segmentation fault?
When reading from a standard perf.data file, perf_session__process_events()
mmaps the file into memory using PROT_READ. The event pointer passed to
perf_event__repipe_attr() points directly into this read-only buffer, so
writing to event->attr.attr.sample_type could immediately trigger a SIGSEGV.
Even when reading from a pipe where the buffer might be writable, applying
these modifications before calling perf_event__process_attr() alters the
flags received by the internal evsel. The sample records arriving next
in the stream still physically contain the original layout, which causes
evsel__parse_sample() to misinterpret payload offsets and abort the session.
> ret = perf_event__process_attr(tool, event, pevlist);
[ ... ]
> @@ -463,7 +531,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool,
> /* remove sample_type {STACK,REGS}_USER for synthesize */
> sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER);
>
> - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type,
> + ret = perf_event__synthesize_sample(event_copy, sample_type,
This wasn't introduced by this patch, but does the sample_size recalculation
just above this cause heap memory leaks and malformed output?
When adjusting the size of the synthesized sample, the function adds the
size of the newly injected callchain but fails to subtract the size of the
original callchain present in the event. Additionally, when removing
STACK_USER, it only subtracts the payload size, failing to remove the 16 bytes
of length prefix and suffix metadata.
This heavily inflates the calculated sample_size. While
perf_event__synthesize_sample() correctly writes the smaller, actual data to
the heap buffer, perf_event__repipe_synth() writes the inflated sample_size
to the output pipe, leaking uninitialized heap memory into the file and
resulting in an oversized, malformed record.
> evsel->core.attr.read_format,
> evsel->core.attr.branch_sample_type, sample);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260429181136.2712655-1-irogers@google.com?part=2
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-29 21:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
2026-04-28 23:19 ` Namhyung Kim
2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
2026-04-28 20:20 ` sashiko-bot
2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers
2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
2026-04-29 20:51 ` sashiko-bot
2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
2026-04-29 21:18 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox