* [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes
@ 2026-04-28 7:03 Ian Rogers
2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria,
linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi
Cc: Ian Rogers
An intel-pt trace can be turned into LBR events either in perf script
or perf inject with the --itrace=L option. With perf inject the
generated perf.data file failed to be parsed as the sample events were
out of sync with their perf_event_attr. A range of fixes were
required.
This patch was separated from a large perf script refactor that
highlighted the breakage:
https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/
Ian Rogers (2):
perf event: Fix size of synthesized sample with branch stacks
perf inject: Fix itrace branch stack synthesis
tools/perf/bench/inject-buildid.c | 9 ++--
tools/perf/builtin-inject.c | 83 ++++++++++++++++++++++++++++--
tools/perf/tests/dlfilter-test.c | 8 ++-
tools/perf/tests/sample-parsing.c | 5 +-
tools/perf/util/arm-spe.c | 7 ++-
tools/perf/util/cs-etm.c | 6 ++-
tools/perf/util/intel-bts.c | 3 +-
tools/perf/util/intel-pt.c | 13 +++--
tools/perf/util/synthetic-events.c | 25 ++++++---
tools/perf/util/synthetic-events.h | 6 ++-
10 files changed, 135 insertions(+), 30 deletions(-)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers @ 2026-04-28 7:03 ` Ian Rogers 2026-04-28 23:19 ` Namhyung Kim 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2 siblings, 1 reply; 7+ messages in thread From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi Cc: Ian Rogers Synthesizing branch stacks for Intel-PT highlighed an issue where PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the perf_event_attr branch_sample_type. This caused an incorrect size calculation. Fix the writing of the nr and hw_idx values during sample event synthesis. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/bench/inject-buildid.c | 9 ++++++--- tools/perf/builtin-inject.c | 12 +++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++++++-- tools/perf/tests/sample-parsing.c | 5 +++-- tools/perf/util/arm-spe.c | 7 +++++-- tools/perf/util/cs-etm.c | 6 ++++-- tools/perf/util/intel-bts.c | 3 ++- tools/perf/util/intel-pt.c | 7 +++++-- tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- tools/perf/util/synthetic-events.h | 6 ++++-- 10 files changed, 62 insertions(+), 26 deletions(-) diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c index aad572a78d7f..bfd2c5ec9488 100644 --- a/tools/perf/bench/inject-buildid.c +++ b/tools/perf/bench/inject-buildid.c @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, event.header.type = PERF_RECORD_SAMPLE; event.header.misc = PERF_RECORD_MISC_USER; - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); - - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + perf_event__synthesize_sample(&event, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); return writen(data->input_pipe[1], &event, event.header.size); } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f174bc69cec4..0c51cb4250d1 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - perf_event__synthesize_sample(event_copy, sample_type, - evsel->core.attr.read_format, sample); + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + if (ret) { + pr_err("Failed to synthesize sample\n"); + return ret; + } return perf_event__repipe_synth(tool, event_copy); } @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, sample_sw.period = sample->period; sample_sw.time = sample->time; perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, - evsel->core.attr.read_format, &sample_sw); + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, &sample_sw); build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); perf_sample__exit(&sample_sw); diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c index e63790c61d53..204663571943 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid event->header.type = PERF_RECORD_SAMPLE; event->header.misc = PERF_RECORD_MISC_USER; - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); + event->header.size = perf_event__sample_event_size(&sample, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + err = perf_event__synthesize_sample(event, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); if (err) return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index a7327c942ca2..55f0b73ca20e 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) sample.read.one.lost = 1; } - sz = perf_event__sample_event_size(&sample, sample_type, read_format); + sz = perf_event__sample_event_size(&sample, sample_type, read_format, + evsel.core.attr.branch_sample_type); bufsz = sz + 4096; /* Add a bit for overrun checking */ event = malloc(bufsz); if (!event) { @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) event->header.size = sz; err = perf_event__synthesize_sample(event, sample_type, read_format, - &sample); + evsel.core.attr.branch_sample_type, &sample); if (err) { pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", "perf_event__synthesize_sample", sample_type, err); diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index e5835042acdf..c4ed9f10e731 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.type = PERF_RECORD_SAMPLE; + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8a639d2e51a4..1ebc1a6a5e75 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, static int cs_etm__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c index 382255393fb3..0b18ebd13f7c 100644 --- a/tools/perf/util/intel-bts.c +++ b/tools/perf/util/intel-bts.c @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, event.sample.header.size = bts->branches_event_size; ret = perf_event__synthesize_sample(&event, bts->branches_sample_type, - 0, &sample); + /*read_format=*/0, /*branch_sample_type=*/0, + &sample); if (ret) return ret; } diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..5142983e3243 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 85bee747f4cd..2461f25a4d7d 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, return process(tool, (union perf_event *) &event, NULL, machine); } -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, + u64 branch_sample_type) { size_t sz, result = sizeof(struct perf_record_sample); @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); + /* nr */ + sz += sizeof(u64); + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) + sz += sizeof(u64); result += sz; } @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, } int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, - const struct perf_sample *sample) + u64 branch_sample_type, const struct perf_sample *sample) { __u64 *array; size_t sz; @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); - memcpy(array, sample->branch_stack, sz); + + *array++ = sample->branch_stack->nr; + + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { + if (sample->no_hw_idx) + *array++ = 0; + else + *array++ = sample->branch_stack->hw_idx; + } + + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index b0edad0c3100..8c7f49f9ccf5 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, + u64 branch_sample_type, const struct perf_sample *sample); int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, + u64 read_format, u64 branch_sample_type); int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, struct target *target, struct perf_thread_map *threads, -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-28 23:19 ` Namhyung Kim 0 siblings, 0 replies; 7+ messages in thread From: Namhyung Kim @ 2026-04-28 23:19 UTC (permalink / raw) To: Ian Rogers Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi On Tue, Apr 28, 2026 at 12:03:27AM -0700, Ian Rogers wrote: > Synthesizing branch stacks for Intel-PT highlighed an issue where > PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the > perf_event_attr branch_sample_type. This caused an incorrect size > calculation. > > Fix the writing of the nr and hw_idx values during sample event > synthesis. > > Assisted-by: Gemini:gemini-3.1-pro-preview > Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Thanks, Namhyung > --- > tools/perf/bench/inject-buildid.c | 9 ++++++--- > tools/perf/builtin-inject.c | 12 +++++++++--- > tools/perf/tests/dlfilter-test.c | 8 ++++++-- > tools/perf/tests/sample-parsing.c | 5 +++-- > tools/perf/util/arm-spe.c | 7 +++++-- > tools/perf/util/cs-etm.c | 6 ++++-- > tools/perf/util/intel-bts.c | 3 ++- > tools/perf/util/intel-pt.c | 7 +++++-- > tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- > tools/perf/util/synthetic-events.h | 6 ++++-- > 10 files changed, 62 insertions(+), 26 deletions(-) > > diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c > index aad572a78d7f..bfd2c5ec9488 100644 > --- a/tools/perf/bench/inject-buildid.c > +++ b/tools/perf/bench/inject-buildid.c > @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, > > event.header.type = PERF_RECORD_SAMPLE; > event.header.misc = PERF_RECORD_MISC_USER; > - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); > - > - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); > + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0); > + perf_event__synthesize_sample(&event, bench_sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0, &sample); > > return writen(data->input_pipe[1], &event, event.header.size); > } > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c > index f174bc69cec4..0c51cb4250d1 100644 > --- a/tools/perf/builtin-inject.c > +++ b/tools/perf/builtin-inject.c > @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, > /* remove sample_type {STACK,REGS}_USER for synthesize */ > sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); > > - perf_event__synthesize_sample(event_copy, sample_type, > - evsel->core.attr.read_format, sample); > + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, sample); > + if (ret) { > + pr_err("Failed to synthesize sample\n"); > + return ret; > + } > return perf_event__repipe_synth(tool, event_copy); > } > > @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, > sample_sw.period = sample->period; > sample_sw.time = sample->time; > perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, > - evsel->core.attr.read_format, &sample_sw); > + evsel->core.attr.read_format, > + evsel->core.attr.branch_sample_type, &sample_sw); > build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); > ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); > perf_sample__exit(&sample_sw); > diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c > index e63790c61d53..204663571943 100644 > --- a/tools/perf/tests/dlfilter-test.c > +++ b/tools/perf/tests/dlfilter-test.c > @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid > > event->header.type = PERF_RECORD_SAMPLE; > event->header.misc = PERF_RECORD_MISC_USER; > - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); > - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); > + event->header.size = perf_event__sample_event_size(&sample, sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0); > + err = perf_event__synthesize_sample(event, sample_type, > + /*read_format=*/0, > + /*branch_sample_type=*/0, &sample); > if (err) > return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); > > diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c > index a7327c942ca2..55f0b73ca20e 100644 > --- a/tools/perf/tests/sample-parsing.c > +++ b/tools/perf/tests/sample-parsing.c > @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) > sample.read.one.lost = 1; > } > > - sz = perf_event__sample_event_size(&sample, sample_type, read_format); > + sz = perf_event__sample_event_size(&sample, sample_type, read_format, > + evsel.core.attr.branch_sample_type); > bufsz = sz + 4096; /* Add a bit for overrun checking */ > event = malloc(bufsz); > if (!event) { > @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) > event->header.size = sz; > > err = perf_event__synthesize_sample(event, sample_type, read_format, > - &sample); > + evsel.core.attr.branch_sample_type, &sample); > if (err) { > pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", > "perf_event__synthesize_sample", sample_type, err); > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c > index e5835042acdf..c4ed9f10e731 100644 > --- a/tools/perf/util/arm-spe.c > +++ b/tools/perf/util/arm-spe.c > @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) > > static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.type = PERF_RECORD_SAMPLE; > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > static inline int > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c > index 8a639d2e51a4..1ebc1a6a5e75 100644 > --- a/tools/perf/util/cs-etm.c > +++ b/tools/perf/util/cs-etm.c > @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, > static int cs_etm__inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > > diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c > index 382255393fb3..0b18ebd13f7c 100644 > --- a/tools/perf/util/intel-bts.c > +++ b/tools/perf/util/intel-bts.c > @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, > event.sample.header.size = bts->branches_event_size; > ret = perf_event__synthesize_sample(&event, > bts->branches_sample_type, > - 0, &sample); > + /*read_format=*/0, /*branch_sample_type=*/0, > + &sample); > if (ret) > return ret; > } > diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c > index fc9eec8b54b8..5142983e3243 100644 > --- a/tools/perf/util/intel-pt.c > +++ b/tools/perf/util/intel-pt.c > @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, > static int intel_pt_inject_event(union perf_event *event, > struct perf_sample *sample, u64 type) > { > - event->header.size = perf_event__sample_event_size(sample, type, 0); > - return perf_event__synthesize_sample(event, type, 0, sample); > + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, > + /*branch_sample_type=*/0); > + > + return perf_event__synthesize_sample(event, type, /*read_format=*/0, > + /*branch_sample_type=*/0, sample); > } > > static inline int intel_pt_opt_inject(struct intel_pt *pt, > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c > index 85bee747f4cd..2461f25a4d7d 100644 > --- a/tools/perf/util/synthetic-events.c > +++ b/tools/perf/util/synthetic-events.c > @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, > return process(tool, (union perf_event *) &event, NULL, machine); > } > > -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) > +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, > + u64 branch_sample_type) > { > size_t sz, result = sizeof(struct perf_record_sample); > > @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, > > if (type & PERF_SAMPLE_BRANCH_STACK) { > sz = sample->branch_stack->nr * sizeof(struct branch_entry); > - /* nr, hw_idx */ > - sz += 2 * sizeof(u64); > + /* nr */ > + sz += sizeof(u64); > + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) > + sz += sizeof(u64); > result += sz; > } > > @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, > } > > int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, > - const struct perf_sample *sample) > + u64 branch_sample_type, const struct perf_sample *sample) > { > __u64 *array; > size_t sz; > @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo > > if (type & PERF_SAMPLE_BRANCH_STACK) { > sz = sample->branch_stack->nr * sizeof(struct branch_entry); > - /* nr, hw_idx */ > - sz += 2 * sizeof(u64); > - memcpy(array, sample->branch_stack, sz); > + > + *array++ = sample->branch_stack->nr; > + > + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { > + if (sample->no_hw_idx) > + *array++ = 0; > + else > + *array++ = sample->branch_stack->hw_idx; > + } > + > + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); > array = (void *)array + sz; > } > > diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h > index b0edad0c3100..8c7f49f9ccf5 100644 > --- a/tools/perf/util/synthetic-events.h > +++ b/tools/perf/util/synthetic-events.h > @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ > int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); > -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); > +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, > + u64 branch_sample_type, const struct perf_sample *sample); > int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); > int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); > int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); > @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, > > int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); > > -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); > +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, > + u64 read_format, u64 branch_sample_type); > > int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, > struct target *target, struct perf_thread_map *threads, > -- > 2.54.0.545.g6539524ca2-goog > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-28 7:03 ` Ian Rogers 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2 siblings, 0 replies; 7+ messages in thread From: Ian Rogers @ 2026-04-28 7:03 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Adrian Hunter, James Clark, Leo Yan, Ravi Bangoria, linux-perf-users, linux-kernel, thomas.falcon, dapeng1.mi Cc: Ian Rogers When using "perf inject --itrace=L" to synthesize branch stacks from AUX data, several issues caused failures with the generated file: 1. The synthesized samples were delivered without the PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's sample_type. Fixed by using sample_type | evsel->synth_sample_type in intel_pt_deliver_synth_event. 2. The record layout was misaligned because of inconsistent handling of PERF_SAMPLE_BRANCH_HW_INDEX. Fixed by explicitly writing nr and hw_idx in perf_event__synthesize_sample. 3. Modifying evsel->core.attr.sample_type early in __cmd_inject caused parse failures for subsequent records in the input file. Fixed by moving this modification to just before writing the header. 4. perf_event__repipe_sample was narrowed to only synthesize samples when branch stack injection was requested, and restored the use of perf_inject__cut_auxtrace_sample as a fallback to preserve functionality. 5. Potential Heap Overflow in perf_event__repipe_sample : Addressed by adding a check that prints an error and returns -EFAULT if the calculated event size exceeds PERF_SAMPLE_MAX_SIZE , as you requested. 6. Header vs Payload Mismatch in __cmd_inject : Addressed by narrowing the condition so that HEADER_BRANCH_STACK is only set in the file header if add_last_branch was true. 7. NULL Pointer Dereference in intel-pt.c : Addressed by updating the condition in intel_pt_do_synth_pebs_sample to fill sample. branch_stack if it was synthesized, even if not in the original sample_type . Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/builtin-inject.c | 87 ++++++++++++++++++++++++++++++++----- tools/perf/util/intel-pt.c | 6 ++- 2 files changed, 81 insertions(+), 12 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 0c51cb4250d1..1f4d25a0efba 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -223,6 +223,11 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, tool); int ret; + if (inject->itrace_synth_opts.add_last_branch) { + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + ret = perf_event__process_attr(tool, event, pevlist); if (ret) return ret; @@ -375,7 +380,60 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, build_id__mark_dso_hit(tool, event, sample, evsel, machine); - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { + if (inject->itrace_synth_opts.set && + (inject->itrace_synth_opts.last_branch || + inject->itrace_synth_opts.add_last_branch)) { + union perf_event *event_copy = (void *)inject->event_copy; + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 }; + int err; + size_t sz; + u64 orig_type = evsel->core.attr.sample_type; + u64 orig_branch_type = evsel->core.attr.branch_sample_type; + + if (event_copy == NULL) { + inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE); + if (!inject->event_copy) + return -ENOMEM; + + event_copy = (void *)inject->event_copy; + } + + if (!sample->branch_stack) + sample->branch_stack = &dummy_bs; + + if (inject->itrace_synth_opts.add_last_branch) { + /* Temporarily add in type bits for synthesis. */ + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + + if (sz > PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE); + return -EFAULT; + } + + event_copy->header.type = PERF_RECORD_SAMPLE; + event_copy->header.misc = event->header.misc; + event_copy->header.size = sz; + + err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + + evsel->core.attr.sample_type = orig_type; + evsel->core.attr.branch_sample_type = orig_branch_type; + + if (err) { + pr_err("Failed to synthesize sample\n"); + return err; + } + event = event_copy; + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) { event = perf_inject__cut_auxtrace_sample(inject, event, sample); if (IS_ERR(event)) return PTR_ERR(event); @@ -463,13 +521,9 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, - evsel->core.attr.read_format, - evsel->core.attr.branch_sample_type, sample); - if (ret) { - pr_err("Failed to synthesize sample\n"); - return ret; - } + perf_event__synthesize_sample(event_copy, sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); return perf_event__repipe_synth(tool, event_copy); } @@ -2440,12 +2494,25 @@ static int __cmd_inject(struct perf_inject *inject) * synthesized hardware events, so clear the feature flag. */ if (inject->itrace_synth_opts.set) { + struct evsel *evsel; + perf_header__clear_feat(&session->header, HEADER_AUXTRACE); - if (inject->itrace_synth_opts.last_branch || - inject->itrace_synth_opts.add_last_branch) + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + } + + if (inject->itrace_synth_opts.add_last_branch) { perf_header__set_feat(&session->header, HEADER_BRANCH_STACK); + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= + PERF_SAMPLE_BRANCH_HW_INDEX; + } + } } /* diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 5142983e3243..2dce6106c038 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,6 +1731,7 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { + event->header.type = PERF_RECORD_SAMPLE; event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, /*branch_sample_type=*/0); @@ -2489,7 +2490,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse intel_pt_add_xmm(intr_regs, pos, items, regs_mask); } - if (sample_type & PERF_SAMPLE_BRANCH_STACK) { + if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) { if (items->mask[INTEL_PT_LBR_0_POS] || items->mask[INTEL_PT_LBR_1_POS] || items->mask[INTEL_PT_LBR_2_POS]) { @@ -2560,7 +2561,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse sample.transaction = txn; } - ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type); + ret = intel_pt_deliver_synth_event(pt, event, &sample, + sample_type | evsel->synth_sample_type); perf_sample__exit(&sample); return ret; } -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2 siblings, 2 replies; 7+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers An intel-pt trace can be turned into LBR events either in perf script or perf inject with the --itrace=L option. With perf inject the generated perf.data file failed to be parsed as the sample events were out of sync with their perf_event_attr. A range of fixes were required. This patch was separated from a large perf script refactor that highlighted the breakage: https://lore.kernel.org/lkml/20260425224951.174663-1-irogers@google.com/ v2: Response to sashiko fixes for patch 2, Namhyung's acked-by for patch 1. v1: https://lore.kernel.org/lkml/20260428070328.1880314-1-irogers@google.com/ Ian Rogers (2): perf event: Fix size of synthesized sample with branch stacks perf inject: Fix itrace branch stack synthesis tools/perf/bench/inject-buildid.c | 9 ++- tools/perf/builtin-inject.c | 107 ++++++++++++++++++++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++- tools/perf/tests/sample-parsing.c | 5 +- tools/perf/util/arm-spe.c | 7 +- tools/perf/util/cs-etm.c | 6 +- tools/perf/util/intel-bts.c | 3 +- tools/perf/util/intel-pt.c | 26 +++++-- tools/perf/util/synthetic-events.c | 25 +++++-- tools/perf/util/synthetic-events.h | 6 +- 10 files changed, 165 insertions(+), 37 deletions(-) -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 1 sibling, 0 replies; 7+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers Synthesizing branch stacks for Intel-PT highlighed an issue where PERF_SAMPLE_BRANCH_HW_INDEX was assumed to always be set in the perf_event_attr branch_sample_type. This caused an incorrect size calculation. Fix the writing of the nr and hw_idx values during sample event synthesis. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> --- tools/perf/bench/inject-buildid.c | 9 ++++++--- tools/perf/builtin-inject.c | 12 +++++++++--- tools/perf/tests/dlfilter-test.c | 8 ++++++-- tools/perf/tests/sample-parsing.c | 5 +++-- tools/perf/util/arm-spe.c | 7 +++++-- tools/perf/util/cs-etm.c | 6 ++++-- tools/perf/util/intel-bts.c | 3 ++- tools/perf/util/intel-pt.c | 7 +++++-- tools/perf/util/synthetic-events.c | 25 ++++++++++++++++++------- tools/perf/util/synthetic-events.h | 6 ++++-- 10 files changed, 62 insertions(+), 26 deletions(-) diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c index aad572a78d7f..bfd2c5ec9488 100644 --- a/tools/perf/bench/inject-buildid.c +++ b/tools/perf/bench/inject-buildid.c @@ -228,9 +228,12 @@ static ssize_t synthesize_sample(struct bench_data *data, struct bench_dso *dso, event.header.type = PERF_RECORD_SAMPLE; event.header.misc = PERF_RECORD_MISC_USER; - event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0); - - perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample); + event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + perf_event__synthesize_sample(&event, bench_sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); return writen(data->input_pipe[1], &event, event.header.size); } diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index f174bc69cec4..0c51cb4250d1 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -463,8 +463,13 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - perf_event__synthesize_sample(event_copy, sample_type, - evsel->core.attr.read_format, sample); + ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + if (ret) { + pr_err("Failed to synthesize sample\n"); + return ret; + } return perf_event__repipe_synth(tool, event_copy); } @@ -1100,7 +1105,8 @@ static int perf_inject__sched_stat(const struct perf_tool *tool, sample_sw.period = sample->period; sample_sw.time = sample->time; perf_event__synthesize_sample(event_sw, evsel->core.attr.sample_type, - evsel->core.attr.read_format, &sample_sw); + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, &sample_sw); build_id__mark_dso_hit(tool, event_sw, &sample_sw, evsel, machine); ret = perf_event__repipe(tool, event_sw, &sample_sw, machine); perf_sample__exit(&sample_sw); diff --git a/tools/perf/tests/dlfilter-test.c b/tools/perf/tests/dlfilter-test.c index e63790c61d53..204663571943 100644 --- a/tools/perf/tests/dlfilter-test.c +++ b/tools/perf/tests/dlfilter-test.c @@ -188,8 +188,12 @@ static int write_sample(struct test_data *td, u64 sample_type, u64 id, pid_t pid event->header.type = PERF_RECORD_SAMPLE; event->header.misc = PERF_RECORD_MISC_USER; - event->header.size = perf_event__sample_event_size(&sample, sample_type, 0); - err = perf_event__synthesize_sample(event, sample_type, 0, &sample); + event->header.size = perf_event__sample_event_size(&sample, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0); + err = perf_event__synthesize_sample(event, sample_type, + /*read_format=*/0, + /*branch_sample_type=*/0, &sample); if (err) return test_result("perf_event__synthesize_sample() failed", TEST_FAIL); diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index a7327c942ca2..55f0b73ca20e 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -310,7 +310,8 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) sample.read.one.lost = 1; } - sz = perf_event__sample_event_size(&sample, sample_type, read_format); + sz = perf_event__sample_event_size(&sample, sample_type, read_format, + evsel.core.attr.branch_sample_type); bufsz = sz + 4096; /* Add a bit for overrun checking */ event = malloc(bufsz); if (!event) { @@ -324,7 +325,7 @@ static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) event->header.size = sz; err = perf_event__synthesize_sample(event, sample_type, read_format, - &sample); + evsel.core.attr.branch_sample_type, &sample); if (err) { pr_debug("%s failed for sample_type %#"PRIx64", error %d\n", "perf_event__synthesize_sample", sample_type, err); diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c index e5835042acdf..c4ed9f10e731 100644 --- a/tools/perf/util/arm-spe.c +++ b/tools/perf/util/arm-spe.c @@ -484,8 +484,11 @@ static void arm_spe__prep_branch_stack(struct arm_spe_queue *speq) static int arm_spe__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.type = PERF_RECORD_SAMPLE; + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c index 8a639d2e51a4..1ebc1a6a5e75 100644 --- a/tools/perf/util/cs-etm.c +++ b/tools/perf/util/cs-etm.c @@ -1425,8 +1425,10 @@ static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq, static int cs_etm__inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c index 382255393fb3..0b18ebd13f7c 100644 --- a/tools/perf/util/intel-bts.c +++ b/tools/perf/util/intel-bts.c @@ -303,7 +303,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq, event.sample.header.size = bts->branches_event_size; ret = perf_event__synthesize_sample(&event, bts->branches_sample_type, - 0, &sample); + /*read_format=*/0, /*branch_sample_type=*/0, + &sample); if (ret) return ret; } diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index fc9eec8b54b8..5142983e3243 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1731,8 +1731,11 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, static int intel_pt_inject_event(union perf_event *event, struct perf_sample *sample, u64 type) { - event->header.size = perf_event__sample_event_size(sample, type, 0); - return perf_event__synthesize_sample(event, type, 0, sample); + event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, + /*branch_sample_type=*/0); + + return perf_event__synthesize_sample(event, type, /*read_format=*/0, + /*branch_sample_type=*/0, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c index 85bee747f4cd..2461f25a4d7d 100644 --- a/tools/perf/util/synthetic-events.c +++ b/tools/perf/util/synthetic-events.c @@ -1455,7 +1455,8 @@ int perf_event__synthesize_stat_round(const struct perf_tool *tool, return process(tool, (union perf_event *) &event, NULL, machine); } -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format) +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format, + u64 branch_sample_type) { size_t sz, result = sizeof(struct perf_record_sample); @@ -1515,8 +1516,10 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); + /* nr */ + sz += sizeof(u64); + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) + sz += sizeof(u64); result += sz; } @@ -1605,7 +1608,7 @@ static __u64 *copy_read_group_values(__u64 *array, __u64 read_format, } int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, - const struct perf_sample *sample) + u64 branch_sample_type, const struct perf_sample *sample) { __u64 *array; size_t sz; @@ -1719,9 +1722,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo if (type & PERF_SAMPLE_BRANCH_STACK) { sz = sample->branch_stack->nr * sizeof(struct branch_entry); - /* nr, hw_idx */ - sz += 2 * sizeof(u64); - memcpy(array, sample->branch_stack, sz); + + *array++ = sample->branch_stack->nr; + + if (branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) { + if (sample->no_hw_idx) + *array++ = 0; + else + *array++ = sample->branch_stack->hw_idx; + } + + memcpy(array, perf_sample__branch_entries((struct perf_sample *)sample), sz); array = (void *)array + sz; } diff --git a/tools/perf/util/synthetic-events.h b/tools/perf/util/synthetic-events.h index b0edad0c3100..8c7f49f9ccf5 100644 --- a/tools/perf/util/synthetic-events.h +++ b/tools/perf/util/synthetic-events.h @@ -81,7 +81,8 @@ int perf_event__synthesize_mmap_events(const struct perf_tool *tool, union perf_ int perf_event__synthesize_modules(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_namespaces(const struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_cgroups(const struct perf_tool *tool, perf_event__handler_t process, struct machine *machine); -int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, const struct perf_sample *sample); +int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_format, + u64 branch_sample_type, const struct perf_sample *sample); int perf_event__synthesize_stat_config(const struct perf_tool *tool, struct perf_stat_config *config, perf_event__handler_t process, struct machine *machine); int perf_event__synthesize_stat_events(struct perf_stat_config *config, const struct perf_tool *tool, struct evlist *evlist, perf_event__handler_t process, bool attrs); int perf_event__synthesize_stat_round(const struct perf_tool *tool, u64 time, u64 type, perf_event__handler_t process, struct machine *machine); @@ -97,7 +98,8 @@ void perf_event__synthesize_final_bpf_metadata(struct perf_session *session, int perf_tool__process_synth_event(const struct perf_tool *tool, union perf_event *event, struct machine *machine, perf_event__handler_t process); -size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, u64 read_format); +size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, + u64 read_format, u64 branch_sample_type); int __machine__synthesize_threads(struct machine *machine, const struct perf_tool *tool, struct target *target, struct perf_thread_map *threads, -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers @ 2026-04-29 18:11 ` Ian Rogers 1 sibling, 0 replies; 7+ messages in thread From: Ian Rogers @ 2026-04-29 18:11 UTC (permalink / raw) To: acme, adrian.hunter, dapeng1.mi, namhyung, thomas.falcon Cc: james.clark, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, ravi.bangoria, Ian Rogers When using "perf inject --itrace=L" to synthesize branch stacks from AUX data, several issues caused failures with the generated file: 1. The synthesized samples were delivered without the PERF_SAMPLE_BRANCH_STACK flag if it was not in the original event's sample_type. Fixed by using sample_type | evsel->synth_sample_type in intel_pt_do_synth_pebs_sample. 2. Modifying evsel->core.attr.sample_type early in __cmd_inject caused parse failures for subsequent records in the input file. Fixed by moving this modification to just before writing the header. 3. perf_event__repipe_sample was narrowed to only synthesize samples when branch stack injection was requested, and restored the use of perf_inject__cut_auxtrace_sample as a fallback to preserve functionality. 4. Potential Heap Overflow in perf_event__repipe_sample: Addressed by adding a check that prints an error and returns -EFAULT if the calculated event size exceeds PERF_SAMPLE_MAX_SIZE. 5. Header vs Payload Mismatch in __cmd_inject: Addressed by narrowing the condition so that HEADER_BRANCH_STACK is only set in the file header if add_last_branch was true. 6. NULL Pointer Dereference in intel-pt.c: Addressed by updating the condition in intel_pt_do_synth_pebs_sample to fill sample. branch_stack if it was synthesized, even if not in the original sample_type. 7. Modifying event attributes in perf_event__repipe_attr early caused downstream parser breakage in pipe mode. Fixed by clearing PERF_SAMPLE_AUX when itrace_synth_opts.set is true to ensure the headers match the stripped payload. Also added a size check against struct perf_record_header_attr to prevent out-of-bounds writes. 8. Potential dangling pointer vulnerability in perf_event__repipe_sample: Addressed by restoring the original sample->branch_stack pointer before returning. 9. Off-by-one error in sample size check in perf_event__repipe_sample: Fixed by checking if sz >= PERF_SAMPLE_MAX_SIZE instead of >. 10. Unadvertised size field left in payload by cut_auxtrace_sample: Addressed by excluding the 8-byte size field from the copied payload to correctly match the cleared PERF_SAMPLE_AUX bit. 11. Silent data corruption in convert_sample_callchain: Addressed by restoring the error check for perf_event__synthesize_sample to prevent writing malformed or uninitialized memory. 12. Omission of hw_idx in PEBS synthesis: Fixed by passing the correct evsel->core.attr.branch_sample_type instead of 0 in intel-pt.c. Assisted-by: Gemini:gemini-3.1-pro-preview Signed-off-by: Ian Rogers <irogers@google.com> --- v2: - Fix error handling in callchain conversion by keeping the error check. - Fix header advertise issues in pipe mode by clearing PERF_SAMPLE_AUX. - Prevent out-of-bounds writes in attribute repiping with size checks. - Fix sample payload misalignment by cutting the unadvertised AUX size field. - Fix off-by-one check in sample size overflow to correctly use >=. - Avoid dangling pointer vulnerability by restoring the branch_stack pointer. - Pass evsel's branch_sample_type to synthesized samples to prevent hw_idx omission. --- tools/perf/builtin-inject.c | 97 ++++++++++++++++++++++++++++++++++--- tools/perf/util/intel-pt.c | 23 ++++++--- 2 files changed, 106 insertions(+), 14 deletions(-) diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c index 0c51cb4250d1..13fb1f05ceb2 100644 --- a/tools/perf/builtin-inject.c +++ b/tools/perf/builtin-inject.c @@ -223,6 +223,19 @@ static int perf_event__repipe_attr(const struct perf_tool *tool, tool); int ret; + if (event->header.size < sizeof(struct perf_record_header_attr)) { + pr_err("Attribute event size %u is too small\n", event->header.size); + return -EINVAL; + } + + if (inject->itrace_synth_opts.set) + event->attr.attr.sample_type &= ~PERF_SAMPLE_AUX; + + if (inject->itrace_synth_opts.add_last_branch) { + event->attr.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + event->attr.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + ret = perf_event__process_attr(tool, event, pevlist); if (ret) return ret; @@ -330,8 +343,8 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject, union perf_event *event, struct perf_sample *sample) { - size_t sz1 = sample->aux_sample.data - (void *)event; - size_t sz2 = event->header.size - sample->aux_sample.size - sz1; + size_t sz1 = sample->aux_sample.data - (void *)event - sizeof(u64); + size_t sz2 = event->header.size - sample->aux_sample.size - (sz1 + sizeof(u64)); union perf_event *ev; if (inject->event_copy == NULL) { @@ -342,13 +355,12 @@ perf_inject__cut_auxtrace_sample(struct perf_inject *inject, ev = (union perf_event *)inject->event_copy; if (sz1 > event->header.size || sz2 > event->header.size || sz1 + sz2 > event->header.size || - sz1 < sizeof(struct perf_event_header) + sizeof(u64)) + sz1 < sizeof(struct perf_event_header)) return event; memcpy(ev, event, sz1); memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2); ev->header.size = sz1 + sz2; - ((u64 *)((void *)ev + sz1))[-1] = 0; return ev; } @@ -375,7 +387,63 @@ static int perf_event__repipe_sample(const struct perf_tool *tool, build_id__mark_dso_hit(tool, event, sample, evsel, machine); - if (inject->itrace_synth_opts.set && sample->aux_sample.size) { + if (inject->itrace_synth_opts.set && + (inject->itrace_synth_opts.last_branch || + inject->itrace_synth_opts.add_last_branch)) { + union perf_event *event_copy = (void *)inject->event_copy; + struct branch_stack dummy_bs = { .nr = 0, .hw_idx = 0 }; + int err; + size_t sz; + u64 orig_type = evsel->core.attr.sample_type; + u64 orig_branch_type = evsel->core.attr.branch_sample_type; + + struct branch_stack *orig_bs = sample->branch_stack; + + if (event_copy == NULL) { + inject->event_copy = malloc(PERF_SAMPLE_MAX_SIZE); + if (!inject->event_copy) + return -ENOMEM; + + event_copy = (void *)inject->event_copy; + } + + if (!sample->branch_stack) + sample->branch_stack = &dummy_bs; + + if (inject->itrace_synth_opts.add_last_branch) { + /* Temporarily add in type bits for synthesis. */ + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_HW_INDEX; + } + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + + sz = perf_event__sample_event_size(sample, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type); + + if (sz >= PERF_SAMPLE_MAX_SIZE) { + pr_err("Sample size %zu exceeds max size %d\n", sz, PERF_SAMPLE_MAX_SIZE); + return -EFAULT; + } + + event_copy->header.type = PERF_RECORD_SAMPLE; + event_copy->header.misc = event->header.misc; + event_copy->header.size = sz; + + err = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + evsel->core.attr.read_format, + evsel->core.attr.branch_sample_type, sample); + + evsel->core.attr.sample_type = orig_type; + evsel->core.attr.branch_sample_type = orig_branch_type; + sample->branch_stack = orig_bs; + + if (err) { + pr_err("Failed to synthesize sample\n"); + return err; + } + event = event_copy; + } else if (inject->itrace_synth_opts.set && sample->aux_sample.size) { event = perf_inject__cut_auxtrace_sample(inject, event, sample); if (IS_ERR(event)) return PTR_ERR(event); @@ -463,7 +531,7 @@ static int perf_event__convert_sample_callchain(const struct perf_tool *tool, /* remove sample_type {STACK,REGS}_USER for synthesize */ sample_type &= ~(PERF_SAMPLE_STACK_USER | PERF_SAMPLE_REGS_USER); - ret = perf_event__synthesize_sample(event_copy, evsel->core.attr.sample_type, + ret = perf_event__synthesize_sample(event_copy, sample_type, evsel->core.attr.read_format, evsel->core.attr.branch_sample_type, sample); if (ret) { @@ -2440,12 +2508,25 @@ static int __cmd_inject(struct perf_inject *inject) * synthesized hardware events, so clear the feature flag. */ if (inject->itrace_synth_opts.set) { + struct evsel *evsel; + perf_header__clear_feat(&session->header, HEADER_AUXTRACE); - if (inject->itrace_synth_opts.last_branch || - inject->itrace_synth_opts.add_last_branch) + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type &= ~PERF_SAMPLE_AUX; + } + + if (inject->itrace_synth_opts.add_last_branch) { perf_header__set_feat(&session->header, HEADER_BRANCH_STACK); + + evlist__for_each_entry(session->evlist, evsel) { + evsel->core.attr.sample_type |= PERF_SAMPLE_BRANCH_STACK; + evsel->core.attr.branch_sample_type |= + PERF_SAMPLE_BRANCH_HW_INDEX; + } + } } /* diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 5142983e3243..f45dbdd4d323 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1728,14 +1728,24 @@ static void intel_pt_prep_b_sample(struct intel_pt *pt, event->sample.header.misc = sample->cpumode; } -static int intel_pt_inject_event(union perf_event *event, +static int intel_pt_inject_event(struct intel_pt *pt, union perf_event *event, struct perf_sample *sample, u64 type) { + struct evsel *evsel = NULL; + u64 branch_sample_type = 0; + + if (pt->session && pt->session->evlist) + evsel = evlist__id2evsel(pt->session->evlist, sample->id); + + if (evsel) + branch_sample_type = evsel->core.attr.branch_sample_type; + + event->header.type = PERF_RECORD_SAMPLE; event->header.size = perf_event__sample_event_size(sample, type, /*read_format=*/0, - /*branch_sample_type=*/0); + branch_sample_type); return perf_event__synthesize_sample(event, type, /*read_format=*/0, - /*branch_sample_type=*/0, sample); + branch_sample_type, sample); } static inline int intel_pt_opt_inject(struct intel_pt *pt, @@ -1745,7 +1755,7 @@ static inline int intel_pt_opt_inject(struct intel_pt *pt, if (!pt->synth_opts.inject) return 0; - return intel_pt_inject_event(event, sample, type); + return intel_pt_inject_event(pt, event, sample, type); } static int intel_pt_deliver_synth_event(struct intel_pt *pt, @@ -2489,7 +2499,7 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse intel_pt_add_xmm(intr_regs, pos, items, regs_mask); } - if (sample_type & PERF_SAMPLE_BRANCH_STACK) { + if ((sample_type | evsel->synth_sample_type) & PERF_SAMPLE_BRANCH_STACK) { if (items->mask[INTEL_PT_LBR_0_POS] || items->mask[INTEL_PT_LBR_1_POS] || items->mask[INTEL_PT_LBR_2_POS]) { @@ -2560,7 +2570,8 @@ static int intel_pt_do_synth_pebs_sample(struct intel_pt_queue *ptq, struct evse sample.transaction = txn; } - ret = intel_pt_deliver_synth_event(pt, event, &sample, sample_type); + ret = intel_pt_deliver_synth_event(pt, event, &sample, + sample_type | evsel->synth_sample_type); perf_sample__exit(&sample); return ret; } -- 2.54.0.545.g6539524ca2-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-29 18:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-28 7:03 [PATCH v1 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-28 7:03 ` [PATCH v1 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-28 23:19 ` Namhyung Kim 2026-04-28 7:03 ` [PATCH v1 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers 2026-04-29 18:11 ` [PATCH v2 0/2] perf inject intel-PT LBR/brstack synthesis fixes Ian Rogers 2026-04-29 18:11 ` [PATCH v2 1/2] perf event: Fix size of synthesized sample with branch stacks Ian Rogers 2026-04-29 18:11 ` [PATCH v2 2/2] perf inject: Fix itrace branch stack synthesis Ian Rogers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox