* [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes
@ 2026-04-28 7:03 Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
Cc: Ian Rogers
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 ++--
tools/perf/builtin-inject.c | 83 ++++++++++++++++++++++++++++--
tools/perf/tests/dlfilter-test.c | 8 ++-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 7 ++-
tools/perf/util/cs-etm.c | 6 ++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 13 +++--
tools/perf/util/synthetic-events.c | 25 ++++++---
tools/perf/util/synthetic-events.h | 6 ++-
10 files changed, 135 insertions(+), 30 deletions(-)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers @ 2026-04-28 7:03 ` Ian Rogers 2026-04-28 23:19 ` Namhyung Kim 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2 siblings, 1 reply; 10+ messages in thread From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi Cc: Ian Rogers Synthesizing branch stacks for Intel-PT highlighed an issue where PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the perf_event_attr branch_sample_type. This caused an incorrect size calculation. Fix the writing of the nr and hw_idx values during sample event synthesis. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/bench/inject-buildid.c | 9 ++++++--- tools/perf/builtin-inject.c | 12 +++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++++++-- tools/perf/tests/sample-parsing.c | 5 +++-- tools/perf/util/arm-spe.c | 7 +++++-- tools/perf/util/cs-etm.c | 6 ++++-- tools/perf/util/intel-bts.c | 3 ++- tools/perf/util/intel-pt.c | 7 +++++-- tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- tools/perf/util/synthetic-events.h | 6 ++++-- 10 files changed, 62 insertions(+), 26 deletions(-) diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c index aad572a78d7f..bfd2c5ec9488 100644 --- a/tools/perf/bench/inject-buildid.c +++ b/tools/perf/bench/inject-buildid.c @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, event.header.type = PERF_RECORD_SAMPLE; event.header.misc = PERF_RECORD_MISC_USER; - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); - - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + perf_event__synthesize_sample(&event, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); return writen(data->input_pipe[1], &event, event.header.size); } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f174bc69cec4..0c51cb4250d1 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - perf_event__synthesize_sample(event_copy, sample_type, - evsel->core.attr.read_format, sample); + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + if (ret) { + pr_err("Failed to synthesize sample\n"); + return ret; + } return perf_event__repipe_synth(tool, event_copy); } @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, sample_sw.period = sample->period; sample_sw.time = sample->time; perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, - evsel->core.attr.read_format, &sample_sw); + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, &sample_sw); build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); perf_sample__exit(&sample_sw); diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c index e63790c61d53..204663571943 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid event->header.type = PERF_RECORD_SAMPLE; event->header.misc = PERF_RECORD_MISC_USER; - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); + event->header.size = perf_event__sample_event_size(&sample, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + err = perf_event__synthesize_sample(event, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); if (err) return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index a7327c942ca2..55f0b73ca20e 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) sample.read.one.lost = 1; } - sz = perf_event__sample_event_size(&sample, sample_type, read_format); + sz = perf_event__sample_event_size(&sample, sample_type, read_format, + evsel.core.attr.branch_sample_type); bufsz = sz + 4096; /* Add a bit for overrun checking */ event = malloc(bufsz); if (!event) { @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) event->header.size = sz; err = perf_event__synthesize_sample(event, sample_type, read_format, - &sample); + evsel.core.attr.branch_sample_type, &sample); if (err) { pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", "perf_event__synthesize_sample", sample_type, err); diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index e5835042acdf..c4ed9f10e731 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.type = PERF_RECORD_SAMPLE; + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8a639d2e51a4..1ebc1a6a5e75 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, static int cs_etm__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c index 382255393fb3..0b18ebd13f7c 100644 --- a/tools/perf/util/intel-bts.c +++ b/tools/perf/util/intel-bts.c @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, event.sample.header.size = bts->branches_event_size; ret = perf_event__synthesize_sample(&event, bts->branches_sample_type, - 0, &sample); + /*read_format=*/0, /*branch_sample_type=*/0, + &sample); if (ret) return ret; } diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..5142983e3243 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 85bee747f4cd..2461f25a4d7d 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, return process(tool, (union perf_event *) &event, NULL, machine); } -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, + u64 branch_sample_type) { size_t sz, result = sizeof(struct perf_record_sample); @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); + /* nr */ + sz += sizeof(u64); + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) + sz += sizeof(u64); result += sz; } @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, } int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, - const struct perf_sample *sample) + u64 branch_sample_type, const struct perf_sample *sample) { __u64 *array; size_t sz; @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); - memcpy(array, sample->branch_stack, sz); + + *array++ = sample->branch_stack->nr; + + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { + if (sample->no_hw_idx) + *array++ = 0; + else + *array++ = sample->branch_stack->hw_idx; + } + + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index b0edad0c3100..8c7f49f9ccf5 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, + u64 branch_sample_type, const struct perf_sample *sample); int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, + u64 read_format, u64 branch_sample_type); int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, struct target *target, struct perf_thread_map *threads, -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-28 23:19 ` Namhyung Kim 0 siblings, 0 replies; 10+ messages in thread From: Namhyung Kim @ 2026-04-28 23:19 UTC (permalink / raw) To: Ian Rogers Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi On Tue, Apr 28, 2026 at 12:03:27AM -0700, Ian Rogers wrote: > Synthesizing branch stacks for Intel-PT highlighed an issue where > PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the > perf_event_attr branch_sample_type. This caused an incorrect size > calculation. > > Fix the writing of the nr and hw_idx values during sample event > synthesis. > > Assisted-by: Gemini:gemini-3.1-pro-preview > Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks, Namhyung > --- > tools/perf/bench/inject-buildid.c | 9 ++++++--- > tools/perf/builtin-inject.c | 12 +++++++++--- > tools/perf/tests/dlfilter-test.c | 8 ++++++-- > tools/perf/tests/sample-parsing.c | 5 +++-- > tools/perf/util/arm-spe.c | 7 +++++-- > tools/perf/util/cs-etm.c | 6 ++++-- > tools/perf/util/intel-bts.c | 3 ++- > tools/perf/util/intel-pt.c | 7 +++++-- > tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- > tools/perf/util/synthetic-events.h | 6 ++++-- > 10 files changed, 62 insertions(+), 26 deletions(-) > > diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c > index aad572a78d7f..bfd2c5ec9488 100644 > --- a/tools/perf/bench/inject-buildid.c > +++ b/tools/perf/bench/inject-buildid.c > @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, > > event.header.type = PERF_RECORD_SAMPLE; > event.header.misc = PERF_RECORD_MISC_USER; > - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); > - > - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); > + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0); > + perf_event__synthesize_sample(&event, bench_sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0, &sample); > > return writen(data->input_pipe[1], &event, event.header.size); > } > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > index f174bc69cec4..0c51cb4250d1 100644 > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c > @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > /* remove sample_type {STACK,REGS}_USER for synthesize */ > sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > > - perf_event__synthesize_sample(event_copy, sample_type, > - evsel->core.attr.read_format, sample); > + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, sample); > + if (ret) { > + pr_err("Failed to synthesize sample\n"); > + return ret; > + } > return perf_event__repipe_synth(tool, event_copy); > } > > @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, > sample_sw.period = sample->period; > sample_sw.time = sample->time; > perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, > - evsel->core.attr.read_format, &sample_sw); > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, &sample_sw); > build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); > ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); > perf_sample__exit(&sample_sw); > diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c > index e63790c61d53..204663571943 100644 > --- a/tools/perf/tests/dlfilter-test.c > +++ b/tools/perf/tests/dlfilter-test.c > @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid > > event->header.type = PERF_RECORD_SAMPLE; > event->header.misc = PERF_RECORD_MISC_USER; > - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); > - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); > + event->header.size = perf_event__sample_event_size(&sample, sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0); > + err = perf_event__synthesize_sample(event, sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0, &sample); > if (err) > return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); > > diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c > index a7327c942ca2..55f0b73ca20e 100644 > --- a/tools/perf/tests/sample-parsing.c > +++ b/tools/perf/tests/sample-parsing.c > @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) > sample.read.one.lost = 1; > } > > - sz = perf_event__sample_event_size(&sample, sample_type, read_format); > + sz = perf_event__sample_event_size(&sample, sample_type, read_format, > + evsel.core.attr.branch_sample_type); > bufsz = sz + 4096; /* Add a bit for overrun checking */ > event = malloc(bufsz); > if (!event) { > @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) > event->header.size = sz; > > err = perf_event__synthesize_sample(event, sample_type, read_format, > - &sample); > + evsel.core.attr.branch_sample_type, &sample); > if (err) { > pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", > "perf_event__synthesize_sample", sample_type, err); > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index e5835042acdf..c4ed9f10e731 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) > > static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.type = PERF_RECORD_SAMPLE; > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > static inline int > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 8a639d2e51a4..1ebc1a6a5e75 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, > static int cs_etm__inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > > diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c > index 382255393fb3..0b18ebd13f7c 100644 > --- a/tools/perf/util/intel-bts.c > +++ b/tools/perf/util/intel-bts.c > @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, > event.sample.header.size = bts->branches_event_size; > ret = perf_event__synthesize_sample(&event, > bts->branches_sample_type, > - 0, &sample); > + /*read_format=*/0, /*branch_sample_type=*/0, > + &sample); > if (ret) > return ret; > } > diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c > index fc9eec8b54b8..5142983e3243 100644 > --- a/tools/perf/util/intel-pt.c > +++ b/tools/perf/util/intel-pt.c > @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, > static int intel_pt_inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > static inline int intel_pt_opt_inject(struct intel_pt *pt, > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c > index 85bee747f4cd..2461f25a4d7d 100644 > --- a/tools/perf/util/synthetic-events.c > +++ b/tools/perf/util/synthetic-events.c > @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, > return process(tool, (union perf_event *) &event, NULL, machine); > } > > -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) > +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, > + u64 branch_sample_type) > { > size_t sz, result = sizeof(struct perf_record_sample); > > @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, > > if (type & PERF_SAMPLE_BRANCH_STACK) { > sz = sample->branch_stack->nr * sizeof(struct branch_entry); > - /* nr, hw_idx */ > - sz += 2 * sizeof(u64); > + /* nr */ > + sz += sizeof(u64); > + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) > + sz += sizeof(u64); > result += sz; > } > > @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, > } > > int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, > - const struct perf_sample *sample) > + u64 branch_sample_type, const struct perf_sample *sample) > { > __u64 *array; > size_t sz; > @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo > > if (type & PERF_SAMPLE_BRANCH_STACK) { > sz = sample->branch_stack->nr * sizeof(struct branch_entry); > - /* nr, hw_idx */ > - sz += 2 * sizeof(u64); > - memcpy(array, sample->branch_stack, sz); > + > + *array++ = sample->branch_stack->nr; > + > + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { > + if (sample->no_hw_idx) > + *array++ = 0; > + else > + *array++ = sample->branch_stack->hw_idx; > + } > + > + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); > array = (void *)array + sz; > } > > diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h > index b0edad0c3100..8c7f49f9ccf5 100644 > --- a/tools/perf/util/synthetic-events.h > +++ b/tools/perf/util/synthetic-events.h > @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ > int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); > -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); > +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, > + u64 branch_sample_type, const struct perf_sample *sample); > int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); > int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); > @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, > > int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); > > -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); > +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, > + u64 read_format, u64 branch_sample_type); > > int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, > struct target *target, struct perf_thread_map *threads, > -- > 2.54.0.545.g6539524ca2-goog > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-28 7:03 ` Ian Rogers 2026-04-28 20:20 ` sashiko-bot 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2 siblings, 1 reply; 10+ messages in thread From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi Cc: Ian Rogers When using "perf inject --itrace=L" to synthesize branch stacks from AUX data, several issues caused failures with the generated file: 1. The synthesized samples were delivered without the PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's sample_type. Fixed by using sample_type | evsel->synth_sample_type in intel_pt_deliver_synth_event. 2. The record layout was misaligned because of inconsistent handling of PERF_SAMPLE_BRANCH_HW_INDEX. Fixed by explicitly writing nr and hw_idx in perf_event__synthesize_sample. 3. Modifying evsel->core.attr.sample_type early in __cmd_inject caused parse failures for subsequent records in the input file. Fixed by moving this modification to just before writing the header. 4. perf_event__repipe_sample was narrowed to only synthesize samples when branch stack injection was requested, and restored the use of perf_inject__cut_auxtrace_sample as a fallback to preserve functionality. 5. Potential Heap Overflow in perf_event__repipe_sample : Addressed by adding a check that prints an error and returns -EFAULT if the calculated event size exceeds PERF_SAMPLE_MAX_SIZE , as you requested. 6. Header vs Payload Mismatch in __cmd_inject : Addressed by narrowing the condition so that HEADER_BRANCH_STACK is only set in the file header if add_last_branch was true. 7. NULL Pointer Dereference in intel-pt.c : Addressed by updating the condition in intel_pt_do_synth_pebs_sample to fill sample. branch_stack if it was synthesized, even if not in the original sample_type . Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/builtin-inject.c | 87 ++++++++++++++++++++++++++++++++----- tools/perf/util/intel-pt.c | 6 ++- 2 files changed, 81 insertions(+), 12 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 0c51cb4250d1..1f4d25a0efba 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -223,6 +223,11 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, tool); int ret; + if (inject->itrace_synth_opts.add_last_branch) { + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + ret = perf_event__process_attr(tool, event, pevlist); if (ret) return ret; @@ -375,7 +380,60 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, build_id__mark_dso_hit(tool, event, sample, evsel, machine); - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { + if (inject->itrace_synth_opts.set && + (inject->itrace_synth_opts.last_branch || + inject->itrace_synth_opts.add_last_branch)) { + union perf_event *event_copy = (void *)inject->event_copy; + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 }; + int err; + size_t sz; + u64 orig_type = evsel->core.attr.sample_type; + u64 orig_branch_type = evsel->core.attr.branch_sample_type; + + if (event_copy == NULL) { + inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE); + if (!inject->event_copy) + return -ENOMEM; + + event_copy = (void *)inject->event_copy; + } + + if (!sample->branch_stack) + sample->branch_stack = &dummy_bs; + + if (inject->itrace_synth_opts.add_last_branch) { + /* Temporarily add in type bits for synthesis. */ + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + + if (sz > PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE); + return -EFAULT; + } + + event_copy->header.type = PERF_RECORD_SAMPLE; + event_copy->header.misc = event->header.misc; + event_copy->header.size = sz; + + err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + + evsel->core.attr.sample_type = orig_type; + evsel->core.attr.branch_sample_type = orig_branch_type; + + if (err) { + pr_err("Failed to synthesize sample\n"); + return err; + } + event = event_copy; + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) { event = perf_inject__cut_auxtrace_sample(inject, event, sample); if (IS_ERR(event)) return PTR_ERR(event); @@ -463,13 +521,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, - evsel->core.attr.read_format, - evsel->core.attr.branch_sample_type, sample); - if (ret) { - pr_err("Failed to synthesize sample\n"); - return ret; - } + perf_event__synthesize_sample(event_copy, sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); return perf_event__repipe_synth(tool, event_copy); } @@ -2440,12 +2494,25 @@ static int __cmd_inject(struct perf_inject *inject) * synthesized hardware events, so clear the feature flag. */ if (inject->itrace_synth_opts.set) { + struct evsel *evsel; + perf_header__clear_feat(&session->header, HEADER_AUXTRACE); - if (inject->itrace_synth_opts.last_branch || - inject->itrace_synth_opts.add_last_branch) + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + } + + if (inject->itrace_synth_opts.add_last_branch) { perf_header__set_feat(&session->header, HEADER_BRANCH_STACK); + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= + PERF_SAMPLE_BRANCH_HW_INDEX; + } + } } /* diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 5142983e3243..2dce6106c038 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,6 +1731,7 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { + event->header.type = PERF_RECORD_SAMPLE; event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, /*branch_sample_type=*/0); @@ -2489,7 +2490,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse intel_pt_add_xmm(intr_regs, pos, items, regs_mask); } - if (sample_type & PERF_SAMPLE_BRANCH_STACK) { + if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) { if (items->mask[INTEL_PT_LBR_0_POS] || items->mask[INTEL_PT_LBR_1_POS] || items->mask[INTEL_PT_LBR_2_POS]) { @@ -2560,7 +2561,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse sample.transaction = txn; } - ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type); + ret = intel_pt_deliver_synth_event(pt, event, &sample, + sample_type | evsel->synth_sample_type); perf_sample__exit(&sample); return ret; } -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers @ 2026-04-28 20:20 ` sashiko-bot 0 siblings, 0 replies; 10+ messages in thread From: sashiko-bot @ 2026-04-28 20:20 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider: - [High] Silent data corruption due to removed error check in perf_event__convert_sample_callchain(). - [High] Pipe mode attribute headers incorrectly advertise PERF_SAMPLE_AUX, breaking downstream parsers. - [High] Out-of-bounds write and parser breakage in perf_event__repipe_attr(). - [High] Unadvertised `size` field left in payload by perf_inject__cut_auxtrace_sample() breaks parsing. - [High] Off-by-one error in sample size overflow check causes infinite parsing loops. - [High] Dangling pointer vulnerability via `sample->branch_stack` stack variable assignment. - [High] `hw_idx` omission in PEBS synthesis causes header vs payload mismatch. -- commit e339fa2584e0f4ca5e2c4e8eb68d763fd40fbc59 Author: Ian Rogers <irogers@google.com> perf inject: Fix itrace branch stack synthesis This patch fixes several issues with perf inject --itrace=L when synthesizing branch stacks from AUX data. It adjusts the sample types and event layout, and addresses parsing issues and potential heap overflows. > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c [ ... ] > @@ -223,6 +223,11 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, > tool); > int ret; > > + if (inject->itrace_synth_opts.add_last_branch) { > + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; > + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; > + } > + Does modifying event->attr.attr.sample_type before calling perf_event__process_attr() risk an out-of-bounds write if the event buffer is truncated? Also, this populates the session's internal evsel with PERF_SAMPLE_BRANCH_STACK, which might cause perf_evsel__parse_sample() to fail on subsequent raw records that lack this data. For pipe mode, this function dynamically emits headers but doesn't clear PERF_SAMPLE_AUX from the repiped event->attr. Since the payload has the AUX data stripped, does this cause downstream parsers to misinterpret event boundaries because the header advertises data that isn't there? > ret = perf_event__process_attr(tool, event, pevlist); [ ... ] > @@ -375,7 +380,60 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, > > build_id__mark_dso_hit(tool, event, sample, evsel, machine); > > - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { > + if (inject->itrace_synth_opts.set && > + (inject->itrace_synth_opts.last_branch || > + inject->itrace_synth_opts.add_last_branch)) { > + union perf_event *event_copy = (void *)inject->event_copy; > + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 }; > + int err; [ ... ] > + if (!sample->branch_stack) > + sample->branch_stack = &dummy_bs; If PERF_FORMAT_GROUP is used, could this leave a dangling pointer? If deliver_sample_group() reuses the exact same sample structure for subsequent deliveries, sample->branch_stack will point to dummy_bs which is allocated on the stack and goes out of scope when this function returns. > + > + if (inject->itrace_synth_opts.add_last_branch) { > + /* Temporarily add in type bits for synthesis. */ > + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; > + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; > + } > + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; > + > + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type, > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type); > + > + if (sz > PERF_SAMPLE_MAX_SIZE) { Does this allow sz to be exactly PERF_SAMPLE_MAX_SIZE (65536)? If sz evaluates to 65536, assigning it to event_copy->header.size (a __u16) would overflow and wrap to 0. Emitting an event with header.size = 0 can cause downstream parsers to enter an infinite loop. Should this check be >= PERF_SAMPLE_MAX_SIZE? > + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE); > + return -EFAULT; > + } [ ... ] > + if (err) { > + pr_err("Failed to synthesize sample\n"); > + return err; > + } > + event = event_copy; > + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) { > event = perf_inject__cut_auxtrace_sample(inject, event, sample); This fallback leaves an 8-byte size field (set to 0) in the sample payload. Since PERF_SAMPLE_AUX is now globally cleared from the evsel headers in __cmd_inject(), will parsers misinterpret this 8-byte field as the next field in the format (like DATA_SRC) and shift all subsequent fields? > if (IS_ERR(event)) > return PTR_ERR(event); [ ... ] > @@ -463,13 +521,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > /* remove sample_type {STACK,REGS}_USER for synthesize */ > sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > > - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, > - evsel->core.attr.read_format, > - evsel->core.attr.branch_sample_type, sample); > - if (ret) { > - pr_err("Failed to synthesize sample\n"); > - return ret; > - } > + perf_event__synthesize_sample(event_copy, sample_type, > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, sample); > return perf_event__repipe_synth(tool, event_copy); Is it safe to remove the error check here? If synthesis fails and returns a negative error code, event_copy might remain uninitialized. Unconditionally passing it to perf_event__repipe_synth() could write malformed or uninitialized memory to the output stream. > } [ ... ] > diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c > --- a/tools/perf/util/intel-pt.c > +++ b/tools/perf/util/intel-pt.c [ ... ] > @@ -1731,6 +1731,7 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, > static int intel_pt_inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > + event->header.type = PERF_RECORD_SAMPLE; > event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > /*branch_sample_type=*/0); > Since __cmd_inject() globally sets PERF_SAMPLE_BRANCH_HW_INDEX on all evsels when add_last_branch is true, does passing branch_sample_type=0 here omit the hw_idx field from synthesized PEBS samples? Downstream parsers might misalign the payload if the file header advertises its presence but the payload omits it. -- Sashiko AI review · https://sashiko.dev/#/patchset/20260428070328.1880314-1-irogers@google.com?part=2 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2 siblings, 2 replies; 10+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers An intel-pt trace can be turned into LBR events either in perf script or perf inject with the --itrace=L option. With perf inject the generated perf.data file failed to be parsed as the sample events were out of sync with their perf_event_attr. A range of fixes were required. This patch was separated from a large perf script refactor that highlighted the breakage: https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/ v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1. v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/ Ian Rogers (2): perf event: Fix size of synthesized sample with branch stacks perf inject: Fix itrace branch stack synthesis tools/perf/bench/inject-buildid.c | 9 ++- tools/perf/builtin-inject.c | 107 ++++++++++++++++++++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++- tools/perf/tests/sample-parsing.c | 5 +- tools/perf/util/arm-spe.c | 7 +- tools/perf/util/cs-etm.c | 6 +- tools/perf/util/intel-bts.c | 3 +- tools/perf/util/intel-pt.c | 26 +++++-- tools/perf/util/synthetic-events.c | 25 +++++-- tools/perf/util/synthetic-events.h | 6 +- 10 files changed, 165 insertions(+), 37 deletions(-) -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 2026-04-29 20:51 ` sashiko-bot 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 1 sibling, 1 reply; 10+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers Synthesizing branch stacks for Intel-PT highlighed an issue where PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the perf_event_attr branch_sample_type. This caused an incorrect size calculation. Fix the writing of the nr and hw_idx values during sample event synthesis. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> --- tools/perf/bench/inject-buildid.c | 9 ++++++--- tools/perf/builtin-inject.c | 12 +++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++++++-- tools/perf/tests/sample-parsing.c | 5 +++-- tools/perf/util/arm-spe.c | 7 +++++-- tools/perf/util/cs-etm.c | 6 ++++-- tools/perf/util/intel-bts.c | 3 ++- tools/perf/util/intel-pt.c | 7 +++++-- tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- tools/perf/util/synthetic-events.h | 6 ++++-- 10 files changed, 62 insertions(+), 26 deletions(-) diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c index aad572a78d7f..bfd2c5ec9488 100644 --- a/tools/perf/bench/inject-buildid.c +++ b/tools/perf/bench/inject-buildid.c @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, event.header.type = PERF_RECORD_SAMPLE; event.header.misc = PERF_RECORD_MISC_USER; - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); - - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + perf_event__synthesize_sample(&event, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); return writen(data->input_pipe[1], &event, event.header.size); } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f174bc69cec4..0c51cb4250d1 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - perf_event__synthesize_sample(event_copy, sample_type, - evsel->core.attr.read_format, sample); + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + if (ret) { + pr_err("Failed to synthesize sample\n"); + return ret; + } return perf_event__repipe_synth(tool, event_copy); } @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, sample_sw.period = sample->period; sample_sw.time = sample->time; perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, - evsel->core.attr.read_format, &sample_sw); + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, &sample_sw); build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); perf_sample__exit(&sample_sw); diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c index e63790c61d53..204663571943 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid event->header.type = PERF_RECORD_SAMPLE; event->header.misc = PERF_RECORD_MISC_USER; - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); + event->header.size = perf_event__sample_event_size(&sample, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + err = perf_event__synthesize_sample(event, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); if (err) return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index a7327c942ca2..55f0b73ca20e 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) sample.read.one.lost = 1; } - sz = perf_event__sample_event_size(&sample, sample_type, read_format); + sz = perf_event__sample_event_size(&sample, sample_type, read_format, + evsel.core.attr.branch_sample_type); bufsz = sz + 4096; /* Add a bit for overrun checking */ event = malloc(bufsz); if (!event) { @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) event->header.size = sz; err = perf_event__synthesize_sample(event, sample_type, read_format, - &sample); + evsel.core.attr.branch_sample_type, &sample); if (err) { pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", "perf_event__synthesize_sample", sample_type, err); diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index e5835042acdf..c4ed9f10e731 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.type = PERF_RECORD_SAMPLE; + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8a639d2e51a4..1ebc1a6a5e75 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, static int cs_etm__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c index 382255393fb3..0b18ebd13f7c 100644 --- a/tools/perf/util/intel-bts.c +++ b/tools/perf/util/intel-bts.c @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, event.sample.header.size = bts->branches_event_size; ret = perf_event__synthesize_sample(&event, bts->branches_sample_type, - 0, &sample); + /*read_format=*/0, /*branch_sample_type=*/0, + &sample); if (ret) return ret; } diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..5142983e3243 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 85bee747f4cd..2461f25a4d7d 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, return process(tool, (union perf_event *) &event, NULL, machine); } -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, + u64 branch_sample_type) { size_t sz, result = sizeof(struct perf_record_sample); @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); + /* nr */ + sz += sizeof(u64); + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) + sz += sizeof(u64); result += sz; } @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, } int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, - const struct perf_sample *sample) + u64 branch_sample_type, const struct perf_sample *sample) { __u64 *array; size_t sz; @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); - memcpy(array, sample->branch_stack, sz); + + *array++ = sample->branch_stack->nr; + + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { + if (sample->no_hw_idx) + *array++ = 0; + else + *array++ = sample->branch_stack->hw_idx; + } + + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index b0edad0c3100..8c7f49f9ccf5 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, + u64 branch_sample_type, const struct perf_sample *sample); int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, + u64 read_format, u64 branch_sample_type); int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, struct target *target, struct perf_thread_map *threads, -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-29 20:51 ` sashiko-bot 0 siblings, 0 replies; 10+ messages in thread From: sashiko-bot @ 2026-04-29 20:51 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Synthesized branch stack samples in hardware tracers (arm-spe, cs-etm, intel-pt) omit the hw_idx field despite the event attribute claiming its presence, causing parsers to read misaligned data. - [High] In perf_event__convert_sample_callchain(), passing the unmasked evsel->core.attr.sample_type causes the generated payload to serialize stack and register data out of sync with the reduced header.size, leading to data stream corruption. -- commit eb27565513c2df3df4959ab5f50b38abd090fa86 Author: Ian Rogers <irogers@google.com> perf event: Fix size of synthesized sample with branch stacks This commit updates sample event synthesis to correctly size and write the nr and hw_idx values for branch stacks by explicitly checking the PERF_SAMPLE_BRANCH_HW_INDEX flag. > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c [ ... ] > @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > /* remove sample_type {STACK,REGS}_USER for synthesize */ > sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > > - perf_event__synthesize_sample(event_copy, sample_type, > - evsel->core.attr.read_format, sample); > + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, sample); I noticed this is fixed later in the patch series by commit "perf inject: Fix itrace branch stack synthesis". Does passing the unmasked evsel->core.attr.sample_type here instead of the local masked sample_type variable cause the generated payload to serialize stack and register data out of sync with the reduced header size? [ ... ] > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c [ ... ] > static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.type = PERF_RECORD_SAMPLE; > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } I see this is fixed for intel-pt.c later in the series by commit "perf inject: Fix itrace branch stack synthesis", but it appears it remains unfixed for arm-spe.c and cs-etm.c. If branch_sample_type is hardcoded to 0 here, will it omit the hw_idx field even if the synthesized event attribute claims it is present? If so, will parsers reading the payload misinterpret the first 8 bytes of the branch entries as the hw_idx field, shifting subsequent sample fields? [ ... ] > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c [ ... ] > static int cs_etm__inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } Does this function suffer from the same issue, where hardcoding branch_sample_type to 0 omits the hw_idx field and causes misaligned reads during parsing? -- Sashiko AI review · https://sashiko.dev/#/patchset/20260429181136.2712655-1-irogers@google.com?part=1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 2026-04-29 21:18 ` sashiko-bot 1 sibling, 1 reply; 10+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers When using "perf inject --itrace=L" to synthesize branch stacks from AUX data, several issues caused failures with the generated file: 1. The synthesized samples were delivered without the PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's sample_type. Fixed by using sample_type | evsel->synth_sample_type in intel_pt_do_synth_pebs_sample. 2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused parse failures for subsequent records in the input file. Fixed by moving this modification to just before writing the header. 3. perf_event__repipe_sample was narrowed to only synthesize samples when branch stack injection was requested, and restored the use of perf_inject__cut_auxtrace_sample as a fallback to preserve functionality. 4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by adding a check that prints an error and returns -EFAULT if the calculated event size exceeds PERF_SAMPLE_MAX_SIZE. 5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing the condition so that HEADER_BRANCH_STACK is only set in the file header if add_last_branch was true. 6. NULL Pointer Dereference in intel-pt.c: Addressed by updating the condition in intel_pt_do_synth_pebs_sample to fill sample. branch_stack if it was synthesized, even if not in the original sample_type. 7. Modifying event attributes in perf_event__repipe_attr early caused downstream parser breakage in pipe mode. Fixed by clearing PERF_SAMPLE_AUX when itrace_synth_opts.set is true to ensure the headers match the stripped payload. Also added a size check against struct perf_record_header_attr to prevent out-of-bounds writes. 8. Potential dangling pointer vulnerability in perf_event__repipe_sample: Addressed by restoring the original sample->branch_stack pointer before returning. 9. Off-by-one error in sample size check in perf_event__repipe_sample: Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >. 10. Unadvertised size field left in payload by cut_auxtrace_sample: Addressed by excluding the 8-byte size field from the copied payload to correctly match the cleared PERF_SAMPLE_AUX bit. 11. Silent data corruption in convert_sample_callchain: Addressed by restoring the error check for perf_event__synthesize_sample to prevent writing malformed or uninitialized memory. 12. Omission of hw_idx in PEBS synthesis: Fixed by passing the correct evsel->core.attr.branch_sample_type instead of 0 in intel-pt.c. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- v2: - Fix error handling in callchain conversion by keeping the error check. - Fix header advertise issues in pipe mode by clearing PERF_SAMPLE_AUX. - Prevent out-of-bounds writes in attribute repiping with size checks. - Fix sample payload misalignment by cutting the unadvertised AUX size field. - Fix off-by-one check in sample size overflow to correctly use >=. - Avoid dangling pointer vulnerability by restoring the branch_stack pointer. - Pass evsel's branch_sample_type to synthesized samples to prevent hw_idx omission. --- tools/perf/builtin-inject.c | 97 ++++++++++++++++++++++++++++++++++--- tools/perf/util/intel-pt.c | 23 ++++++--- 2 files changed, 106 insertions(+), 14 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 0c51cb4250d1..13fb1f05ceb2 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -223,6 +223,19 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, tool); int ret; + if (event->header.size < sizeof(struct perf_record_header_attr)) { + pr_err("Attribute event size %u is too small\n", event->header.size); + return -EINVAL; + } + + if (inject->itrace_synth_opts.set) + event->attr.attr.sample_type &= ~PERF_SAMPLE_AUX; + + if (inject->itrace_synth_opts.add_last_branch) { + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + ret = perf_event__process_attr(tool, event, pevlist); if (ret) return ret; @@ -330,8 +343,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject, union perf_event *event, struct perf_sample *sample) { - size_t sz1 = sample->aux_sample.data - (void *)event; - size_t sz2 = event->header.size - sample->aux_sample.size - sz1; + size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64); + size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64)); union perf_event *ev; if (inject->event_copy == NULL) { @@ -342,13 +355,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject, ev = (union perf_event *)inject->event_copy; if (sz1 > event->header.size || sz2 > event->header.size || sz1 + sz2 > event->header.size || - sz1 < sizeof(struct perf_event_header) + sizeof(u64)) + sz1 < sizeof(struct perf_event_header)) return event; memcpy(ev, event, sz1); memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2); ev->header.size = sz1 + sz2; - ((u64 *)((void *)ev + sz1))[-1] = 0; return ev; } @@ -375,7 +387,63 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, build_id__mark_dso_hit(tool, event, sample, evsel, machine); - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { + if (inject->itrace_synth_opts.set && + (inject->itrace_synth_opts.last_branch || + inject->itrace_synth_opts.add_last_branch)) { + union perf_event *event_copy = (void *)inject->event_copy; + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 }; + int err; + size_t sz; + u64 orig_type = evsel->core.attr.sample_type; + u64 orig_branch_type = evsel->core.attr.branch_sample_type; + + struct branch_stack *orig_bs = sample->branch_stack; + + if (event_copy == NULL) { + inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE); + if (!inject->event_copy) + return -ENOMEM; + + event_copy = (void *)inject->event_copy; + } + + if (!sample->branch_stack) + sample->branch_stack = &dummy_bs; + + if (inject->itrace_synth_opts.add_last_branch) { + /* Temporarily add in type bits for synthesis. */ + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + + if (sz >= PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE); + return -EFAULT; + } + + event_copy->header.type = PERF_RECORD_SAMPLE; + event_copy->header.misc = event->header.misc; + event_copy->header.size = sz; + + err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + + evsel->core.attr.sample_type = orig_type; + evsel->core.attr.branch_sample_type = orig_branch_type; + sample->branch_stack = orig_bs; + + if (err) { + pr_err("Failed to synthesize sample\n"); + return err; + } + event = event_copy; + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) { event = perf_inject__cut_auxtrace_sample(inject, event, sample); if (IS_ERR(event)) return PTR_ERR(event); @@ -463,7 +531,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + ret = perf_event__synthesize_sample(event_copy, sample_type, evsel->core.attr.read_format, evsel->core.attr.branch_sample_type, sample); if (ret) { @@ -2440,12 +2508,25 @@ static int __cmd_inject(struct perf_inject *inject) * synthesized hardware events, so clear the feature flag. */ if (inject->itrace_synth_opts.set) { + struct evsel *evsel; + perf_header__clear_feat(&session->header, HEADER_AUXTRACE); - if (inject->itrace_synth_opts.last_branch || - inject->itrace_synth_opts.add_last_branch) + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + } + + if (inject->itrace_synth_opts.add_last_branch) { perf_header__set_feat(&session->header, HEADER_BRANCH_STACK); + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= + PERF_SAMPLE_BRANCH_HW_INDEX; + } + } } /* diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 5142983e3243..f45dbdd4d323 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1728,14 +1728,24 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, event->sample.header.misc = sample->cpumode; } -static int intel_pt_inject_event(union perf_event *event, +static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event, struct perf_sample *sample, u64 type) { + struct evsel *evsel = NULL; + u64 branch_sample_type = 0; + + if (pt->session && pt->session->evlist) + evsel = evlist__id2evsel(pt->session->evlist, sample->id); + + if (evsel) + branch_sample_type = evsel->core.attr.branch_sample_type; + + event->header.type = PERF_RECORD_SAMPLE; event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, - /*branch_sample_type=*/0); + branch_sample_type); return perf_event__synthesize_sample(event, type, /*read_format=*/0, - /*branch_sample_type=*/0, sample); + branch_sample_type, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, @@ -1745,7 +1755,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt, if (!pt->synth_opts.inject) return 0; - return intel_pt_inject_event(event, sample, type); + return intel_pt_inject_event(pt, event, sample, type); } static int intel_pt_deliver_synth_event(struct intel_pt *pt, @@ -2489,7 +2499,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse intel_pt_add_xmm(intr_regs, pos, items, regs_mask); } - if (sample_type & PERF_SAMPLE_BRANCH_STACK) { + if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) { if (items->mask[INTEL_PT_LBR_0_POS] || items->mask[INTEL_PT_LBR_1_POS] || items->mask[INTEL_PT_LBR_2_POS]) { @@ -2560,7 +2570,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse sample.transaction = txn; } - ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type); + ret = intel_pt_deliver_synth_event(pt, event, &sample, + sample_type | evsel->synth_sample_type); perf_sample__exit(&sample); return ret; } -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers @ 2026-04-29 21:18 ` sashiko-bot 0 siblings, 0 replies; 10+ messages in thread From: sashiko-bot @ 2026-04-29 21:18 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-users Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: - [Critical] Modifying `event->attr.attr` in-place in `perf_event__repipe_attr` causes SIGSEGV and completely breaks sample parsing. - [High] The size check in `perf_event__repipe_attr` incorrectly rejects valid older files and causes out-of-bounds writes. - [High] Flawed `sample_size` recalculation in `perf_event__convert_sample_callchain` causes heap memory leaks and malformed output. -- commit 64e2d17135bd2355121cf62d57be79321e910c46 Author: Ian Rogers <irogers@google.com> perf inject: Fix itrace branch stack synthesis This commit addresses several issues with synthesizing branch stacks from AUX data when using "perf inject --itrace=L". It modifies sample types early to handle missing branch stack flags and fixes sample synthesis logic. > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > index 0c51cb4250d1..13fb1f05ceb2 100644 > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c > @@ -223,6 +223,19 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, > tool); > int ret; > > + if (event->header.size < sizeof(struct perf_record_header_attr)) { Does this size check incorrectly reject valid older files? Older perf.data files have a smaller attr.size (e.g., 64 bytes). Valid events from older kernels may have a total size of 72 bytes, triggering this check and causing perf inject to wrongly reject them. If an older event has enough trailing IDs to pass the size check, the struct ends at offset 64 and the id[] array begins at offset 72. Modifying event->attr.attr.branch_sample_type writes to offset 80, which overlaps with the id[] array and can corrupt the event ID data. > + pr_err("Attribute event size %u is too small\n", event->header.size); > + return -EINVAL; > + } > + > + if (inject->itrace_synth_opts.set) > + event->attr.attr.sample_type &= ~PERF_SAMPLE_AUX; > + > + if (inject->itrace_synth_opts.add_last_branch) { > + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; > + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; > + } Could modifying event->attr.attr in-place here cause a segmentation fault? When reading from a standard perf.data file, perf_session__process_events() mmaps the file into memory using PROT_READ. The event pointer passed to perf_event__repipe_attr() points directly into this read-only buffer, so writing to event->attr.attr.sample_type could immediately trigger a SIGSEGV. Even when reading from a pipe where the buffer might be writable, applying these modifications before calling perf_event__process_attr() alters the flags received by the internal evsel. The sample records arriving next in the stream still physically contain the original layout, which causes evsel__parse_sample() to misinterpret payload offsets and abort the session. > ret = perf_event__process_attr(tool, event, pevlist); [ ... ] > @@ -463,7 +531,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > /* remove sample_type {STACK,REGS}_USER for synthesize */ > sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > > - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, > + ret = perf_event__synthesize_sample(event_copy, sample_type, This wasn't introduced by this patch, but does the sample_size recalculation just above this cause heap memory leaks and malformed output? When adjusting the size of the synthesized sample, the function adds the size of the newly injected callchain but fails to subtract the size of the original callchain present in the event. Additionally, when removing STACK_USER, it only subtracts the payload size, failing to remove the 16 bytes of length prefix and suffix metadata. This heavily inflates the calculated sample_size. While perf_event__synthesize_sample() correctly writes the smaller, actual data to the heap buffer, perf_event__repipe_synth() writes the inflated sample_size to the output pipe, leaking uninitialized heap memory into the file and resulting in an oversized, malformed record. > evsel->core.attr.read_format, > evsel->core.attr.branch_sample_type, sample); -- Sashiko AI review · https://sashiko.dev/#/patchset/20260429181136.2712655-1-irogers@google.com?part=2 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-29 21:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-28 23:19 ` Namhyung Kim 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2026-04-28 20:20 ` sashiko-bot 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-29 20:51 ` sashiko-bot 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2026-04-29 21:18 ` sashiko-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox