* [PATCHSET v5 0/6] perf tools: Add deferred callchain support
@ 2025-11-20 2:10 Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
Hello,
This is a new version of deferred callchain support as the kernel part
is merged to the tip tree. Actually this is based on Steve's work (v16).
https://lore.kernel.org/r/20250908175319.841517121@kernel.org
v5 changes)
* update delegate tools (Ian)
* copy and flush remaining samples (Ian)
* add Ian's Reviewed-by tags
v4: https://lore.kernel.org/r/20251115234106.348571-1-namhyung@kernel.org
* add --call-graph fp,defer option (Ian, Steve)
* add more comment on the cookie (Ian)
* display cookie part in the deferred callchain (Ian)
v3: https://lore.kernel.org/r/20251114070018.160330-1-namhyung@kernel.org
* handle new attr.defer_output to generate deferred callchains
* fix crash when cookies don't match (Steven)
* disable merging for perf inject
* fix missing feature detection bug
* symbolize merged callchains properly
Here's an example session.
$ perf record --call-graph fp,defer pwd
/home/namhyung/project/linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB perf.data (29 samples) ]
$ perf evlist -v
cpu/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES),
{ sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD,
read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1,
task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1,
defer_callchain: 1, defer_output: 1
$ perf script
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
$ perf script --no-merge-callchains
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
b00000006 (cookie) ([unknown])
pwd 2312 121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
The code is available at 'perf/defer-callchain-v5' branch in
git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (6):
tools headers UAPI: Sync linux/perf_event.h for deferred callchains
perf tools: Minimal DEFERRED_CALLCHAIN support
perf record: Add --call-graph fp,defer option for deferred callchains
perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
perf tools: Merge deferred user callchains
perf tools: Flush remaining samples w/o deferred callchains
tools/include/uapi/linux/perf_event.h | 21 ++-
tools/lib/perf/include/perf/event.h | 13 ++
tools/perf/Documentation/perf-config.txt | 3 +
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/Documentation/perf-script.txt | 5 +
tools/perf/builtin-inject.c | 1 +
tools/perf/builtin-report.c | 1 +
tools/perf/builtin-script.c | 93 +++++++++++
tools/perf/util/callchain.c | 45 +++++-
tools/perf/util/callchain.h | 4 +
tools/perf/util/event.c | 1 +
tools/perf/util/evlist.c | 1 +
tools/perf/util/evlist.h | 2 +
tools/perf/util/evsel.c | 50 +++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/evsel_fprintf.c | 5 +-
tools/perf/util/machine.c | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 2 +
tools/perf/util/sample.h | 2 +
tools/perf/util/session.c | 183 ++++++++++++++++++++++
tools/perf/util/tool.c | 5 +
tools/perf/util/tool.h | 4 +-
22 files changed, 439 insertions(+), 8 deletions(-)
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
It needs to sync with the kernel to support user space changes for the
deferred callchains.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/include/uapi/linux/perf_event.h | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 78a362b8002776e5..d292f96bc06f86bc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -463,7 +463,9 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ defer_callchain: 1, /* request PERF_RECORD_CALLCHAIN_DEFERRED records */
+ defer_output : 1, /* output PERF_RECORD_CALLCHAIN_DEFERRED records */
+ __reserved_1 : 24;
union {
__u32 wakeup_events; /* wake up every n events */
@@ -1239,6 +1241,22 @@ enum perf_event_type {
*/
PERF_RECORD_AUX_OUTPUT_HW_ID = 21,
+ /*
+ * This user callchain capture was deferred until shortly before
+ * returning to user space. Previous samples would have kernel
+ * callchains only and they need to be stitched with this to make full
+ * callchains.
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u64 cookie;
+ * u64 nr;
+ * u64 ips[nr];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_CALLCHAIN_DEFERRED = 22,
+
PERF_RECORD_MAX, /* non-ABI */
};
@@ -1269,6 +1287,7 @@ enum perf_callchain_context {
PERF_CONTEXT_HV = (__u64)-32,
PERF_CONTEXT_KERNEL = (__u64)-128,
PERF_CONTEXT_USER = (__u64)-512,
+ PERF_CONTEXT_USER_DEFERRED = (__u64)-640,
PERF_CONTEXT_GUEST = (__u64)-2048,
PERF_CONTEXT_GUEST_KERNEL = (__u64)-2176,
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
Add a new event type for deferred callchains and a new callback for the
struct perf_tool. For now it doesn't actually handle the deferred
callchains but it just marks the sample if it has the PERF_CONTEXT_
USER_DEFFERED in the callchain array.
At least, perf report can dump the raw data with this change. Actually
this requires the next commit to enable attr.defer_callchain, but if you
already have a data file, it'll show the following result.
$ perf report -D
...
0x2158@perf.data [0x40]: event: 22
.
. ... raw event: size 64 bytes
. 0000: 16 00 00 00 02 00 40 00 06 00 00 00 0b 00 00 00 ......@.........
. 0010: 03 00 00 00 00 00 00 00 a7 7f 33 fe 18 7f 00 00 ..........3.....
. 0020: 0f 0e 33 fe 18 7f 00 00 48 14 33 fe 18 7f 00 00 ..3.....H.3.....
. 0030: 08 09 00 00 08 09 00 00 e6 7a e7 35 1c 00 00 00 .........z.5....
121163447014 0x2158 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2312/2312: 0xb00000006
... FP chain: nr:3
..... 0: 00007f18fe337fa7
..... 1: 00007f18fe330e0f
..... 2: 00007f18fe331448
: unhandled!
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/lib/perf/include/perf/event.h | 13 ++++++++++
tools/perf/util/event.c | 1 +
tools/perf/util/evsel.c | 31 +++++++++++++++++++++--
tools/perf/util/machine.c | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 2 ++
tools/perf/util/sample.h | 2 ++
tools/perf/util/session.c | 20 +++++++++++++++
tools/perf/util/tool.c | 3 +++
tools/perf/util/tool.h | 3 ++-
9 files changed, 73 insertions(+), 3 deletions(-)
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index aa1e91c97a226e1a..43a8cb04994fa033 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -151,6 +151,18 @@ struct perf_record_switch {
__u32 next_prev_tid;
};
+struct perf_record_callchain_deferred {
+ struct perf_event_header header;
+ /*
+ * This is to match kernel and (deferred) user stacks together.
+ * The kernel part will be in the sample callchain array after
+ * the PERF_CONTEXT_USER_DEFERRED entry.
+ */
+ __u64 cookie;
+ __u64 nr;
+ __u64 ips[];
+};
+
struct perf_record_header_attr {
struct perf_event_header header;
struct perf_event_attr attr;
@@ -523,6 +535,7 @@ union perf_event {
struct perf_record_read read;
struct perf_record_throttle throttle;
struct perf_record_sample sample;
+ struct perf_record_callchain_deferred callchain_deferred;
struct perf_record_bpf_event bpf;
struct perf_record_ksymbol ksymbol;
struct perf_record_text_poke_event text_poke;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index fcf44149feb20c35..4c92cc1a952c1d9f 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -61,6 +61,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_CGROUP] = "CGROUP",
[PERF_RECORD_TEXT_POKE] = "TEXT_POKE",
[PERF_RECORD_AUX_OUTPUT_HW_ID] = "AUX_OUTPUT_HW_ID",
+ [PERF_RECORD_CALLCHAIN_DEFERRED] = "CALLCHAIN_DEFERRED",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 989c56d4a23f74f4..5ee3e7dee93fbbcb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3089,6 +3089,20 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
data->data_src = PERF_MEM_DATA_SRC_NONE;
data->vcpu = -1;
+ if (event->header.type == PERF_RECORD_CALLCHAIN_DEFERRED) {
+ const u64 max_callchain_nr = UINT64_MAX / sizeof(u64);
+
+ data->callchain = (struct ip_callchain *)&event->callchain_deferred.nr;
+ if (data->callchain->nr > max_callchain_nr)
+ return -EFAULT;
+
+ data->deferred_cookie = event->callchain_deferred.cookie;
+
+ if (evsel->core.attr.sample_id_all)
+ perf_evsel__parse_id_sample(evsel, event, data);
+ return 0;
+ }
+
if (event->header.type != PERF_RECORD_SAMPLE) {
if (!evsel->core.attr.sample_id_all)
return 0;
@@ -3213,12 +3227,25 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
if (type & PERF_SAMPLE_CALLCHAIN) {
const u64 max_callchain_nr = UINT64_MAX / sizeof(u64);
+ u64 callchain_nr;
OVERFLOW_CHECK_u64(array);
data->callchain = (struct ip_callchain *)array++;
- if (data->callchain->nr > max_callchain_nr)
+ callchain_nr = data->callchain->nr;
+ if (callchain_nr > max_callchain_nr)
return -EFAULT;
- sz = data->callchain->nr * sizeof(u64);
+ sz = callchain_nr * sizeof(u64);
+ /*
+ * Save the cookie for the deferred user callchain. The last 2
+ * entries in the callchain should be the context marker and the
+ * cookie. The cookie will be used to match PERF_RECORD_
+ * CALLCHAIN_DEFERRED later.
+ */
+ if (evsel->core.attr.defer_callchain && callchain_nr >= 2 &&
+ data->callchain->ips[callchain_nr - 2] == PERF_CONTEXT_USER_DEFERRED) {
+ data->deferred_cookie = data->callchain->ips[callchain_nr - 1];
+ data->deferred_callchain = true;
+ }
OVERFLOW_CHECK(array, sz, max_size);
array = (void *)array + sz;
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b5dd42588c916d91..841b711d970e9457 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2124,6 +2124,7 @@ static int add_callchain_ip(struct thread *thread,
*cpumode = PERF_RECORD_MISC_KERNEL;
break;
case PERF_CONTEXT_USER:
+ case PERF_CONTEXT_USER_DEFERRED:
*cpumode = PERF_RECORD_MISC_USER;
break;
default:
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 66b666d9ce649dd7..741c3d657a8b6ae7 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -343,6 +343,8 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(inherit_thread, p_unsigned);
PRINT_ATTRf(remove_on_exec, p_unsigned);
PRINT_ATTRf(sigtrap, p_unsigned);
+ PRINT_ATTRf(defer_callchain, p_unsigned);
+ PRINT_ATTRf(defer_output, p_unsigned);
PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned, false);
PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index fae834144ef42105..a8307b20a9ea8066 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -107,6 +107,8 @@ struct perf_sample {
/** @weight3: On x86 holds retire_lat, on powerpc holds p_stage_cyc. */
u16 weight3;
bool no_hw_idx; /* No hw_idx collected in branch_stack */
+ bool deferred_callchain; /* Has deferred user callchains */
+ u64 deferred_cookie;
char insn[MAX_INSN];
void *raw_data;
struct ip_callchain *callchain;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b0236b2df2913e1..361e15c1f26a96d0 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -720,6 +720,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_CGROUP] = perf_event__cgroup_swap,
[PERF_RECORD_TEXT_POKE] = perf_event__text_poke_swap,
[PERF_RECORD_AUX_OUTPUT_HW_ID] = perf_event__all64_swap,
+ [PERF_RECORD_CALLCHAIN_DEFERRED] = perf_event__all64_swap,
[PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
[PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap,
[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -854,6 +855,9 @@ static void callchain__printf(struct evsel *evsel,
for (i = 0; i < callchain->nr; i++)
printf("..... %2d: %016" PRIx64 "\n",
i, callchain->ips[i]);
+
+ if (sample->deferred_callchain)
+ printf("...... (deferred)\n");
}
static void branch_stack__printf(struct perf_sample *sample,
@@ -1123,6 +1127,19 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
sample_read__printf(sample, evsel->core.attr.read_format);
}
+static void dump_deferred_callchain(struct evsel *evsel, union perf_event *event,
+ struct perf_sample *sample)
+{
+ if (!dump_trace)
+ return;
+
+ printf("(IP, 0x%x): %d/%d: %#" PRIx64 "\n",
+ event->header.misc, sample->pid, sample->tid, sample->deferred_cookie);
+
+ if (evsel__has_callchain(evsel))
+ callchain__printf(evsel, sample);
+}
+
static void dump_read(struct evsel *evsel, union perf_event *event)
{
struct perf_record_read *read_event = &event->read;
@@ -1353,6 +1370,9 @@ static int machines__deliver_event(struct machines *machines,
return tool->text_poke(tool, event, sample, machine);
case PERF_RECORD_AUX_OUTPUT_HW_ID:
return tool->aux_output_hw_id(tool, event, sample, machine);
+ case PERF_RECORD_CALLCHAIN_DEFERRED:
+ dump_deferred_callchain(evsel, event, sample);
+ return tool->callchain_deferred(tool, event, sample, evsel, machine);
default:
++evlist->stats.nr_unknown_events;
return -1;
diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
index 22a8a4ffe05f778e..e77f0e2ecc1f79db 100644
--- a/tools/perf/util/tool.c
+++ b/tools/perf/util/tool.c
@@ -287,6 +287,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
tool->read = process_event_sample_stub;
tool->throttle = process_event_stub;
tool->unthrottle = process_event_stub;
+ tool->callchain_deferred = process_event_sample_stub;
tool->attr = process_event_synth_attr_stub;
tool->event_update = process_event_synth_event_update_stub;
tool->tracing_data = process_event_synth_tracing_data_stub;
@@ -335,6 +336,7 @@ bool perf_tool__compressed_is_stub(const struct perf_tool *tool)
}
CREATE_DELEGATE_SAMPLE(read);
CREATE_DELEGATE_SAMPLE(sample);
+CREATE_DELEGATE_SAMPLE(callchain_deferred);
#define CREATE_DELEGATE_ATTR(name) \
static int delegate_ ## name(const struct perf_tool *tool, \
@@ -468,6 +470,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
tool->tool.ksymbol = delegate_ksymbol;
tool->tool.bpf = delegate_bpf;
tool->tool.text_poke = delegate_text_poke;
+ tool->tool.callchain_deferred = delegate_callchain_deferred;
tool->tool.attr = delegate_attr;
tool->tool.event_update = delegate_event_update;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 88337cee1e3e2be3..9b9f0a8cbf3de4b5 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -44,7 +44,8 @@ enum show_feature_header {
struct perf_tool {
event_sample sample,
- read;
+ read,
+ callchain_deferred;
event_op mmap,
mmap2,
comm,
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 3/6] perf record: Add --call-graph fp,defer option for deferred callchains
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
Add a new callchain record mode option for deferred callchains. For now
it only works with FP (frame-pointer) mode.
And add the missing feature detection logic to clear the flag on old
kernels.
$ perf record --call-graph fp,defer -vv true
...
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CALLCHAIN|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
defer_callchain 1
defer_output 1
------------------------------------------------------------
sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off deferred callchain support
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/Documentation/perf-config.txt | 3 +++
tools/perf/Documentation/perf-record.txt | 4 ++++
tools/perf/util/callchain.c | 16 +++++++++++++---
tools/perf/util/callchain.h | 1 +
tools/perf/util/evsel.c | 19 +++++++++++++++++++
tools/perf/util/evsel.h | 1 +
6 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
index c6f33565966735fe..642d1c490d9e3bcd 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -452,6 +452,9 @@ Variables
kernel space is controlled not by this option but by the
kernel config (CONFIG_UNWINDER_*).
+ The 'defer' mode can be used with 'fp' mode to enable deferred
+ user callchains (like 'fp,defer').
+
call-graph.dump-size::
The size of stack to dump in order to do post-unwinding. Default is 8192 (byte).
When using dwarf into record-mode, the default size will be used if omitted.
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 067891bd7da6edc8..e8b9aadbbfa50574 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -325,6 +325,10 @@ OPTIONS
by default. User can change the number by passing it after comma
like "--call-graph fp,32".
+ Also "defer" can be used with "fp" (like "--call-graph fp,defer") to
+ enable deferred user callchain which will collect user-space callchains
+ when the thread returns to the user space.
+
-q::
--quiet::
Don't print any warnings or messages, useful for scripting.
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index d7b7eef740b9d6ed..2884187ccbbecfdc 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -275,9 +275,13 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
if (tok) {
unsigned long size;
- size = strtoul(tok, &name, 0);
- if (size < (unsigned) sysctl__max_stack())
- param->max_stack = size;
+ if (!strncmp(tok, "defer", sizeof("defer"))) {
+ param->defer = true;
+ } else {
+ size = strtoul(tok, &name, 0);
+ if (size < (unsigned) sysctl__max_stack())
+ param->max_stack = size;
+ }
}
break;
@@ -314,6 +318,12 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
} while (0);
free(buf);
+
+ if (param->defer && param->record_mode != CALLCHAIN_FP) {
+ pr_err("callchain: deferred callchain only works with FP\n");
+ return -EINVAL;
+ }
+
return ret;
}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 86ed9e4d04f9ee7b..d5ae4fbb7ce5fa44 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -98,6 +98,7 @@ extern bool dwarf_callchain_users;
struct callchain_param {
bool enabled;
+ bool defer;
enum perf_call_graph_mode record_mode;
u32 dump_size;
enum chain_mode mode;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5ee3e7dee93fbbcb..7772ee9cfe3ac1c7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1065,6 +1065,9 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
pr_info("Disabling user space callchains for function trace event.\n");
attr->exclude_callchain_user = 1;
}
+
+ if (param->defer && !attr->exclude_callchain_user)
+ attr->defer_callchain = 1;
}
void evsel__config_callchain(struct evsel *evsel, struct record_opts *opts,
@@ -1511,6 +1514,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
attr->mmap2 = track && !perf_missing_features.mmap2;
attr->comm = track;
attr->build_id = track && opts->build_id;
+ attr->defer_output = track && callchain->defer;
/*
* ksymbol is tracked separately with text poke because it needs to be
@@ -2199,6 +2203,10 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
static void evsel__disable_missing_features(struct evsel *evsel)
{
+ if (perf_missing_features.defer_callchain && evsel->core.attr.defer_callchain)
+ evsel->core.attr.defer_callchain = 0;
+ if (perf_missing_features.defer_callchain && evsel->core.attr.defer_output)
+ evsel->core.attr.defer_output = 0;
if (perf_missing_features.inherit_sample_read && evsel->core.attr.inherit &&
(evsel->core.attr.sample_type & PERF_SAMPLE_READ))
evsel->core.attr.inherit = 0;
@@ -2473,6 +2481,13 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
/* Please add new feature detection here. */
+ attr.defer_callchain = true;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.defer_callchain = true;
+ pr_debug2("switching off deferred callchain support\n");
+ attr.defer_callchain = false;
+
attr.inherit = true;
attr.sample_type = PERF_SAMPLE_READ | PERF_SAMPLE_TID;
if (has_attr_feature(&attr, /*flags=*/0))
@@ -2584,6 +2599,10 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
errno = old_errno;
check:
+ if ((evsel->core.attr.defer_callchain || evsel->core.attr.defer_output) &&
+ perf_missing_features.defer_callchain)
+ return true;
+
if (evsel->core.attr.inherit &&
(evsel->core.attr.sample_type & PERF_SAMPLE_READ) &&
perf_missing_features.inherit_sample_read)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3ae4ac8f9a37e009..a08130ff2e47a887 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -221,6 +221,7 @@ struct perf_missing_features {
bool branch_counters;
bool aux_action;
bool inherit_sample_read;
+ bool defer_callchain;
};
extern struct perf_missing_features perf_missing_features;
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
` (2 preceding siblings ...)
2025-11-20 2:10 ` [PATCH v5 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 5/6] perf tools: Merge deferred user callchains Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
5 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
Handle the deferred callchains in the script output.
$ perf script
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
b00000006 (cookie) ([unknown])
pwd 2312 121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-script.c | 89 +++++++++++++++++++++++++++++++++
tools/perf/util/evsel_fprintf.c | 5 +-
2 files changed, 93 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 011962e1ee0f6898..85b42205a71b3993 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2706,6 +2706,94 @@ static int process_sample_event(const struct perf_tool *tool,
return ret;
}
+static int process_deferred_sample_event(const struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct evsel *evsel,
+ struct machine *machine)
+{
+ struct perf_script *scr = container_of(tool, struct perf_script, tool);
+ struct perf_event_attr *attr = &evsel->core.attr;
+ struct evsel_script *es = evsel->priv;
+ unsigned int type = output_type(attr->type);
+ struct addr_location al;
+ FILE *fp = es->fp;
+ int ret = 0;
+
+ if (output[type].fields == 0)
+ return 0;
+
+ /* Set thread to NULL to indicate addr_al and al are not initialized */
+ addr_location__init(&al);
+
+ if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
+ sample->time)) {
+ goto out_put;
+ }
+
+ if (debug_mode) {
+ if (sample->time < last_timestamp) {
+ pr_err("Samples misordered, previous: %" PRIu64
+ " this: %" PRIu64 "\n", last_timestamp,
+ sample->time);
+ nr_unordered++;
+ }
+ last_timestamp = sample->time;
+ goto out_put;
+ }
+
+ if (filter_cpu(sample))
+ goto out_put;
+
+ if (machine__resolve(machine, &al, sample) < 0) {
+ pr_err("problem processing %d event, skipping it.\n",
+ event->header.type);
+ ret = -1;
+ goto out_put;
+ }
+
+ if (al.filtered)
+ goto out_put;
+
+ if (!show_event(sample, evsel, al.thread, &al, NULL))
+ goto out_put;
+
+ if (evswitch__discard(&scr->evswitch, evsel))
+ goto out_put;
+
+ perf_sample__fprintf_start(scr, sample, al.thread, evsel,
+ PERF_RECORD_CALLCHAIN_DEFERRED, fp);
+ fprintf(fp, "DEFERRED CALLCHAIN [cookie: %llx]",
+ (unsigned long long)event->callchain_deferred.cookie);
+
+ if (PRINT_FIELD(IP)) {
+ struct callchain_cursor *cursor = NULL;
+
+ if (symbol_conf.use_callchain && sample->callchain) {
+ cursor = get_tls_callchain_cursor();
+ if (thread__resolve_callchain(al.thread, cursor, evsel,
+ sample, NULL, NULL,
+ scripting_max_stack)) {
+ pr_info("cannot resolve deferred callchains\n");
+ cursor = NULL;
+ }
+ }
+
+ fputc(cursor ? '\n' : ' ', fp);
+ sample__fprintf_sym(sample, &al, 0, output[type].print_ip_opts,
+ cursor, symbol_conf.bt_stop_list, fp);
+ }
+
+ fprintf(fp, "\n");
+
+ if (verbose > 0)
+ fflush(fp);
+
+out_put:
+ addr_location__exit(&al);
+ return ret;
+}
+
// Used when scr->per_event_dump is not set
static struct evsel_script es_stdout;
@@ -4303,6 +4391,7 @@ int cmd_script(int argc, const char **argv)
perf_tool__init(&script.tool, !unsorted_dump);
script.tool.sample = process_sample_event;
+ script.tool.callchain_deferred = process_deferred_sample_event;
script.tool.mmap = perf_event__process_mmap;
script.tool.mmap2 = perf_event__process_mmap2;
script.tool.comm = perf_event__process_comm;
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 103984b29b1e10ae..10f1a03c28601e36 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -168,7 +168,10 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
node_al.addr = addr;
node_al.map = map__get(map);
- if (print_symoffset) {
+ if (sample->deferred_callchain &&
+ sample->deferred_cookie == node->ip) {
+ printed += fprintf(fp, "(cookie)");
+ } else if (print_symoffset) {
printed += __symbol__fprintf_symname_offs(sym, &node_al,
print_unknown_as_addr,
true, fp);
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 5/6] perf tools: Merge deferred user callchains
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
` (3 preceding siblings ...)
2025-11-20 2:10 ` [PATCH v5 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 5:13 ` Ian Rogers
2025-11-20 2:10 ` [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
5 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
Save samples with deferred callchains in a separate list and deliver
them after merging the user callchains. If users don't want to merge
they can set tool->merge_deferred_callchains to false to prevent the
behavior.
With previous result, now perf script will show the merged callchains.
$ perf script
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
The old output can be get using --no-merge-callchain option.
Also perf report can get the user callchain entry at the end.
$ perf report --no-children --stdio -q -S __build_id_parse.isra.0
# symbol: __build_id_parse.isra.0
8.40% pwd [kernel.kallsyms]
|
---__build_id_parse.isra.0
perf_event_mmap
mprotect_fixup
do_mprotect_pkey
__x64_sys_mprotect
do_syscall_64
entry_SYSCALL_64_after_hwframe
mprotect
_dl_sysdep_start
_dl_start_user
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/Documentation/perf-script.txt | 5 ++
tools/perf/builtin-inject.c | 1 +
tools/perf/builtin-report.c | 1 +
tools/perf/builtin-script.c | 4 ++
tools/perf/util/callchain.c | 29 ++++++++++
tools/perf/util/callchain.h | 3 ++
tools/perf/util/evlist.c | 1 +
tools/perf/util/evlist.h | 2 +
tools/perf/util/session.c | 67 +++++++++++++++++++++++-
tools/perf/util/tool.c | 2 +
tools/perf/util/tool.h | 1 +
11 files changed, 115 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 28bec7e78bc858ba..03d1129606328d6d 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -527,6 +527,11 @@ include::itrace.txt[]
The known limitations include exception handing such as
setjmp/longjmp will have calls/returns not match.
+--merge-callchains::
+ Enable merging deferred user callchains if available. This is the
+ default behavior. If you want to see separate CALLCHAIN_DEFERRED
+ records for some reason, use --no-merge-callchains explicitly.
+
:GMEXAMPLECMD: script
:GMEXAMPLESUBCMD:
include::guest-files.txt[]
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index bd9245d2dd41aa48..51d2721b6db9dccb 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
inject.tool.auxtrace = perf_event__repipe_auxtrace;
inject.tool.bpf_metadata = perf_event__repipe_op2_synth;
inject.tool.dont_split_sample_group = true;
+ inject.tool.merge_deferred_callchains = false;
inject.session = __perf_session__new(&data, &inject.tool,
/*trace_event_repipe=*/inject.output.is_pipe,
/*host_env=*/NULL);
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2bc269f5fcef8023..add6b1c2aaf04270 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
report.tool.event_update = perf_event__process_event_update;
report.tool.feature = process_feature_event;
report.tool.ordering_requires_timestamps = true;
+ report.tool.merge_deferred_callchains = !dump_trace;
session = perf_session__new(&data, &report.tool);
if (IS_ERR(session)) {
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 85b42205a71b3993..62e43d3c5ad731a0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
bool header_only = false;
bool script_started = false;
bool unsorted_dump = false;
+ bool merge_deferred_callchains = true;
char *rec_script_path = NULL;
char *rep_script_path = NULL;
struct perf_session *session;
@@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
"Guest code can be found in hypervisor process"),
OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
"Enable LBR callgraph stitching approach"),
+ OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
+ "Enable merge deferred user callchains"),
OPTS_EVSWITCH(&script.evswitch),
OPT_END()
};
@@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
script.tool.throttle = process_throttle_event;
script.tool.unthrottle = process_throttle_event;
script.tool.ordering_requires_timestamps = true;
+ script.tool.merge_deferred_callchains = merge_deferred_callchains;
session = perf_session__new(&data, &script.tool);
if (IS_ERR(session))
return PTR_ERR(session);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 2884187ccbbecfdc..71dc5a070065dd2a 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
}
return 0;
}
+
+int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
+ struct perf_sample *sample_callchain)
+{
+ u64 nr_orig = sample_orig->callchain->nr - 1;
+ u64 nr_deferred = sample_callchain->callchain->nr;
+ struct ip_callchain *callchain;
+
+ if (sample_orig->callchain->nr < 2) {
+ sample_orig->deferred_callchain = false;
+ return -EINVAL;
+ }
+
+ callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
+ if (callchain == NULL) {
+ sample_orig->deferred_callchain = false;
+ return -ENOMEM;
+ }
+
+ callchain->nr = nr_orig + nr_deferred;
+ /* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
+ memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
+ /* copy deferred user callchains */
+ memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
+ nr_deferred * sizeof(u64));
+
+ sample_orig->callchain = callchain;
+ return 0;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d5ae4fbb7ce5fa44..2a52af8c80ace33c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -318,4 +318,7 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
struct perf_sample *sample, int max_stack,
bool symbols, callchain_iter_fn cb, void *data);
+int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
+ struct perf_sample *sample_callchain);
+
#endif /* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e8217efdda5323c6..03674d2cbd015e4f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -85,6 +85,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
evlist->ctl_fd.pos = -1;
evlist->nr_br_cntr = -1;
metricgroup__rblist_init(&evlist->metric_events);
+ INIT_LIST_HEAD(&evlist->deferred_samples);
}
struct evlist *evlist__new(void)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 5e71e3dc60423079..911834ae7c2a6f76 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -92,6 +92,8 @@ struct evlist {
* of struct metric_expr.
*/
struct rblist metric_events;
+ /* samples with deferred_callchain would wait here. */
+ struct list_head deferred_samples;
};
struct evsel_str_handler {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 361e15c1f26a96d0..2e777fd1bcf6707b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1285,6 +1285,60 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
per_thread);
}
+struct deferred_event {
+ struct list_head list;
+ union perf_event *event;
+};
+
+static int evlist__deliver_deferred_samples(struct evlist *evlist,
+ const struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ struct deferred_event *de, *tmp;
+ struct evsel *evsel;
+ int ret = 0;
+
+ if (!tool->merge_deferred_callchains) {
+ evsel = evlist__id2evsel(evlist, sample->id);
+ return tool->callchain_deferred(tool, event, sample,
+ evsel, machine);
+ }
+
+ list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
+ struct perf_sample orig_sample;
+
+ ret = evlist__parse_sample(evlist, de->event, &orig_sample);
+ if (ret < 0) {
+ pr_err("failed to parse original sample\n");
+ break;
+ }
+
+ if (sample->tid != orig_sample.tid)
+ continue;
+
+ if (event->callchain_deferred.cookie == orig_sample.deferred_cookie)
+ sample__merge_deferred_callchain(&orig_sample, sample);
+ else
+ orig_sample.deferred_callchain = false;
+
+ evsel = evlist__id2evsel(evlist, orig_sample.id);
+ ret = evlist__deliver_sample(evlist, tool, de->event,
+ &orig_sample, evsel, machine);
+
+ if (orig_sample.deferred_callchain)
+ free(orig_sample.callchain);
+
+ list_del(&de->list);
+ free(de);
+
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
static int machines__deliver_event(struct machines *machines,
struct evlist *evlist,
union perf_event *event,
@@ -1313,6 +1367,16 @@ static int machines__deliver_event(struct machines *machines,
return 0;
}
dump_sample(evsel, event, sample, perf_env__arch(machine->env));
+ if (sample->deferred_callchain && tool->merge_deferred_callchains) {
+ struct deferred_event *de = malloc(sizeof(*de));
+
+ if (de == NULL)
+ return -ENOMEM;
+
+ de->event = event;
+ list_add_tail(&de->list, &evlist->deferred_samples);
+ return 0;
+ }
return evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
case PERF_RECORD_MMAP:
return tool->mmap(tool, event, sample, machine);
@@ -1372,7 +1436,8 @@ static int machines__deliver_event(struct machines *machines,
return tool->aux_output_hw_id(tool, event, sample, machine);
case PERF_RECORD_CALLCHAIN_DEFERRED:
dump_deferred_callchain(evsel, event, sample);
- return tool->callchain_deferred(tool, event, sample, evsel, machine);
+ return evlist__deliver_deferred_samples(evlist, tool, event,
+ sample, machine);
default:
++evlist->stats.nr_unknown_events;
return -1;
diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
index e77f0e2ecc1f79db..27ba5849c74a2e7d 100644
--- a/tools/perf/util/tool.c
+++ b/tools/perf/util/tool.c
@@ -266,6 +266,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
tool->cgroup_events = false;
tool->no_warn = false;
tool->show_feat_hdr = SHOW_FEAT_NO_HEADER;
+ tool->merge_deferred_callchains = true;
tool->sample = process_event_sample_stub;
tool->mmap = process_event_stub;
@@ -448,6 +449,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
tool->tool.cgroup_events = delegate->cgroup_events;
tool->tool.no_warn = delegate->no_warn;
tool->tool.show_feat_hdr = delegate->show_feat_hdr;
+ tool->tool.merge_deferred_callchains = delegate->merge_deferred_callchains;
tool->tool.sample = delegate_sample;
tool->tool.read = delegate_read;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 9b9f0a8cbf3de4b5..e96b69d25a5b737d 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -90,6 +90,7 @@ struct perf_tool {
bool cgroup_events;
bool no_warn;
bool dont_split_sample_group;
+ bool merge_deferred_callchains;
enum show_feature_header show_feat_hdr;
};
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
` (4 preceding siblings ...)
2025-11-20 2:10 ` [PATCH v5 5/6] perf tools: Merge deferred user callchains Namhyung Kim
@ 2025-11-20 2:10 ` Namhyung Kim
2025-11-20 5:29 ` Ian Rogers
5 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 2:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf
It's possible that some kernel samples don't have matching deferred
callchain records when the profiling session was ended before the
threads came back to userspace. Let's flush the samples before
finish the session.
Also 32-bit systems can see partial mmap for the data. In that case,
deferred samples won't point to the correct data once the mapping moves
to the next portion of the file. Copy the original sample before it
unmaps the current data.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/session.c | 98 +++++++++++++++++++++++++++++++++++++++
1 file changed, 98 insertions(+)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2e777fd1bcf6707b..b781e01ddcb4876b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1288,8 +1288,13 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
struct deferred_event {
struct list_head list;
union perf_event *event;
+ bool allocated;
};
+/*
+ * This is called when a deferred callchain record comes up. Find all matching
+ * samples, merge the callchains and process them.
+ */
static int evlist__deliver_deferred_samples(struct evlist *evlist,
const struct perf_tool *tool,
union perf_event *event,
@@ -1331,6 +1336,86 @@ static int evlist__deliver_deferred_samples(struct evlist *evlist,
free(orig_sample.callchain);
list_del(&de->list);
+ if (de->allocated)
+ free(de->event);
+ free(de);
+
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
+/*
+ * This is called when the backing mmap is about to go away. It needs to save
+ * the original sample data until it finds the matching deferred callchains.
+ */
+static void evlist__copy_deferred_samples(struct evlist *evlist,
+ const struct perf_tool *tool,
+ struct machine *machine)
+{
+ struct deferred_event *de, *tmp;
+ struct evsel *evsel;
+ int ret = 0;
+
+ list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
+ struct perf_sample sample;
+ size_t sz = de->event->header.size;
+ void *buf;
+
+ if (de->allocated)
+ continue;
+
+ buf = malloc(sz);
+ if (buf) {
+ memcpy(buf, de->event, sz);
+ de->event = buf;
+ de->allocated = true;
+ continue;
+ }
+
+ /* The allocation failed, flush the sample now */
+ ret = evlist__parse_sample(evlist, de->event, &sample);
+ if (ret == 0) {
+ evsel = evlist__id2evsel(evlist, sample.id);
+ evlist__deliver_sample(evlist, tool, de->event,
+ &sample, evsel, machine);
+ }
+
+ list_del(&de->list);
+ BUG_ON(de->allocated);
+ free(de);
+ }
+}
+
+/*
+ * This is called at the end of the data processing for the session. Flush the
+ * remaining samples as there's no hope for matching deferred callchains.
+ */
+static int evlist__flush_deferred_samples(struct evlist *evlist,
+ const struct perf_tool *tool,
+ struct machine *machine)
+{
+ struct deferred_event *de, *tmp;
+ struct evsel *evsel;
+ int ret = 0;
+
+ list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
+ struct perf_sample sample;
+
+ ret = evlist__parse_sample(evlist, de->event, &sample);
+ if (ret < 0) {
+ pr_err("failed to parse original sample\n");
+ break;
+ }
+
+ evsel = evlist__id2evsel(evlist, sample.id);
+ ret = evlist__deliver_sample(evlist, tool, de->event,
+ &sample, evsel, machine);
+
+ list_del(&de->list);
+ if (de->allocated)
+ free(de->event);
free(de);
if (ret)
@@ -1374,6 +1459,7 @@ static int machines__deliver_event(struct machines *machines,
return -ENOMEM;
de->event = event;
+ de->allocated = false;
list_add_tail(&de->list, &evlist->deferred_samples);
return 0;
}
@@ -2218,6 +2304,8 @@ reader__mmap(struct reader *rd, struct perf_session *session)
}
if (mmaps[rd->mmap_idx]) {
+ evlist__copy_deferred_samples(session->evlist, session->tool,
+ &session->machines.host);
munmap(mmaps[rd->mmap_idx], rd->mmap_size);
mmaps[rd->mmap_idx] = NULL;
}
@@ -2372,6 +2460,11 @@ static int __perf_session__process_events(struct perf_session *session)
if (err)
goto out_err;
err = auxtrace__flush_events(session, tool);
+ if (err)
+ goto out_err;
+ err = evlist__flush_deferred_samples(session->evlist,
+ session->tool,
+ &session->machines.host);
if (err)
goto out_err;
err = perf_session__flush_thread_stacks(session);
@@ -2494,6 +2587,11 @@ static int __perf_session__process_dir_events(struct perf_session *session)
if (ret)
goto out_err;
+ ret = evlist__flush_deferred_samples(session->evlist, tool,
+ &session->machines.host);
+ if (ret)
+ goto out_err;
+
ret = perf_session__flush_thread_stacks(session);
out_err:
ui_progress__finish();
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 5/6] perf tools: Merge deferred user callchains
2025-11-20 2:10 ` [PATCH v5 5/6] perf tools: Merge deferred user callchains Namhyung Kim
@ 2025-11-20 5:13 ` Ian Rogers
2025-11-20 21:44 ` Namhyung Kim
0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2025-11-20 5:13 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
Mathieu Desnoyers, linux-trace-kernel, bpf
On Wed, Nov 19, 2025 at 6:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Save samples with deferred callchains in a separate list and deliver
> them after merging the user callchains. If users don't want to merge
> they can set tool->merge_deferred_callchains to false to prevent the
> behavior.
>
> With previous result, now perf script will show the merged callchains.
>
> $ perf script
> ...
> pwd 2312 121.163435: 249113 cpu/cycles/P:
> ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
> ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
> ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
> ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
> ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
> ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
> ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> ...
>
> The old output can be get using --no-merge-callchain option.
> Also perf report can get the user callchain entry at the end.
>
> $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
> # symbol: __build_id_parse.isra.0
> 8.40% pwd [kernel.kallsyms]
> |
> ---__build_id_parse.isra.0
> perf_event_mmap
> mprotect_fixup
> do_mprotect_pkey
> __x64_sys_mprotect
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> mprotect
> _dl_sysdep_start
> _dl_start_user
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/Documentation/perf-script.txt | 5 ++
> tools/perf/builtin-inject.c | 1 +
> tools/perf/builtin-report.c | 1 +
> tools/perf/builtin-script.c | 4 ++
> tools/perf/util/callchain.c | 29 ++++++++++
> tools/perf/util/callchain.h | 3 ++
> tools/perf/util/evlist.c | 1 +
> tools/perf/util/evlist.h | 2 +
> tools/perf/util/session.c | 67 +++++++++++++++++++++++-
> tools/perf/util/tool.c | 2 +
> tools/perf/util/tool.h | 1 +
> 11 files changed, 115 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 28bec7e78bc858ba..03d1129606328d6d 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -527,6 +527,11 @@ include::itrace.txt[]
> The known limitations include exception handing such as
> setjmp/longjmp will have calls/returns not match.
>
> +--merge-callchains::
> + Enable merging deferred user callchains if available. This is the
> + default behavior. If you want to see separate CALLCHAIN_DEFERRED
> + records for some reason, use --no-merge-callchains explicitly.
> +
> :GMEXAMPLECMD: script
> :GMEXAMPLESUBCMD:
> include::guest-files.txt[]
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index bd9245d2dd41aa48..51d2721b6db9dccb 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
> inject.tool.auxtrace = perf_event__repipe_auxtrace;
> inject.tool.bpf_metadata = perf_event__repipe_op2_synth;
> inject.tool.dont_split_sample_group = true;
> + inject.tool.merge_deferred_callchains = false;
> inject.session = __perf_session__new(&data, &inject.tool,
> /*trace_event_repipe=*/inject.output.is_pipe,
> /*host_env=*/NULL);
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 2bc269f5fcef8023..add6b1c2aaf04270 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
> report.tool.event_update = perf_event__process_event_update;
> report.tool.feature = process_feature_event;
> report.tool.ordering_requires_timestamps = true;
> + report.tool.merge_deferred_callchains = !dump_trace;
>
> session = perf_session__new(&data, &report.tool);
> if (IS_ERR(session)) {
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 85b42205a71b3993..62e43d3c5ad731a0 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
> bool header_only = false;
> bool script_started = false;
> bool unsorted_dump = false;
> + bool merge_deferred_callchains = true;
> char *rec_script_path = NULL;
> char *rep_script_path = NULL;
> struct perf_session *session;
> @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
> "Guest code can be found in hypervisor process"),
> OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
> "Enable LBR callgraph stitching approach"),
> + OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
> + "Enable merge deferred user callchains"),
> OPTS_EVSWITCH(&script.evswitch),
> OPT_END()
> };
> @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
> script.tool.throttle = process_throttle_event;
> script.tool.unthrottle = process_throttle_event;
> script.tool.ordering_requires_timestamps = true;
> + script.tool.merge_deferred_callchains = merge_deferred_callchains;
> session = perf_session__new(&data, &script.tool);
> if (IS_ERR(session))
> return PTR_ERR(session);
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 2884187ccbbecfdc..71dc5a070065dd2a 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> }
> return 0;
> }
> +
> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> + struct perf_sample *sample_callchain)
> +{
> + u64 nr_orig = sample_orig->callchain->nr - 1;
> + u64 nr_deferred = sample_callchain->callchain->nr;
> + struct ip_callchain *callchain;
> +
> + if (sample_orig->callchain->nr < 2) {
> + sample_orig->deferred_callchain = false;
> + return -EINVAL;
> + }
> +
> + callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
> + if (callchain == NULL) {
> + sample_orig->deferred_callchain = false;
> + return -ENOMEM;
> + }
> +
> + callchain->nr = nr_orig + nr_deferred;
> + /* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
> + memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
> + /* copy deferred user callchains */
> + memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
> + nr_deferred * sizeof(u64));
> +
> + sample_orig->callchain = callchain;
> + return 0;
> +}
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index d5ae4fbb7ce5fa44..2a52af8c80ace33c 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -318,4 +318,7 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> struct perf_sample *sample, int max_stack,
> bool symbols, callchain_iter_fn cb, void *data);
>
> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> + struct perf_sample *sample_callchain);
> +
> #endif /* __PERF_CALLCHAIN_H */
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e8217efdda5323c6..03674d2cbd015e4f 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -85,6 +85,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
> evlist->ctl_fd.pos = -1;
> evlist->nr_br_cntr = -1;
> metricgroup__rblist_init(&evlist->metric_events);
> + INIT_LIST_HEAD(&evlist->deferred_samples);
> }
>
> struct evlist *evlist__new(void)
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 5e71e3dc60423079..911834ae7c2a6f76 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -92,6 +92,8 @@ struct evlist {
> * of struct metric_expr.
> */
> struct rblist metric_events;
> + /* samples with deferred_callchain would wait here. */
> + struct list_head deferred_samples;
> };
>
> struct evsel_str_handler {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 361e15c1f26a96d0..2e777fd1bcf6707b 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1285,6 +1285,60 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
> per_thread);
> }
>
> +struct deferred_event {
> + struct list_head list;
> + union perf_event *event;
Is this the old version of the patch? No comment and it seems the
event's memory isn't copied. I'm worried as we have events in stack
allocated memory such as:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/session.c?h=perf-tools-next#n1618
or copies:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-inject.c?h=perf-tools-next#n336
or just because the ring buffer overwrites itself. My belief is that
we don't hold events, and the associated parsed sample, longer than
the tool callback because it'll be wrong/corrupt after that. Here the
deferred callchain events are all being held longer than a single tool
event callback.
> +};
> +
> +static int evlist__deliver_deferred_samples(struct evlist *evlist,
> + const struct perf_tool *tool,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine)
> +{
> + struct deferred_event *de, *tmp;
> + struct evsel *evsel;
> + int ret = 0;
> +
> + if (!tool->merge_deferred_callchains) {
> + evsel = evlist__id2evsel(evlist, sample->id);
> + return tool->callchain_deferred(tool, event, sample,
> + evsel, machine);
> + }
> +
> + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> + struct perf_sample orig_sample;
> +
> + ret = evlist__parse_sample(evlist, de->event, &orig_sample);
> + if (ret < 0) {
> + pr_err("failed to parse original sample\n");
> + break;
> + }
> +
> + if (sample->tid != orig_sample.tid)
> + continue;
> +
> + if (event->callchain_deferred.cookie == orig_sample.deferred_cookie)
> + sample__merge_deferred_callchain(&orig_sample, sample);
> + else
> + orig_sample.deferred_callchain = false;
> +
> + evsel = evlist__id2evsel(evlist, orig_sample.id);
> + ret = evlist__deliver_sample(evlist, tool, de->event,
> + &orig_sample, evsel, machine);
> +
> + if (orig_sample.deferred_callchain)
> + free(orig_sample.callchain);
> +
> + list_del(&de->list);
There's no free of de->event.
> + free(de);
> +
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> static int machines__deliver_event(struct machines *machines,
> struct evlist *evlist,
> union perf_event *event,
> @@ -1313,6 +1367,16 @@ static int machines__deliver_event(struct machines *machines,
> return 0;
> }
> dump_sample(evsel, event, sample, perf_env__arch(machine->env));
> + if (sample->deferred_callchain && tool->merge_deferred_callchains) {
> + struct deferred_event *de = malloc(sizeof(*de));
> +
> + if (de == NULL)
> + return -ENOMEM;
> +
> + de->event = event;
Here the event is assigned but not copied.
Thanks,
Ian
> + list_add_tail(&de->list, &evlist->deferred_samples);
> + return 0;
> + }
> return evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
> case PERF_RECORD_MMAP:
> return tool->mmap(tool, event, sample, machine);
> @@ -1372,7 +1436,8 @@ static int machines__deliver_event(struct machines *machines,
> return tool->aux_output_hw_id(tool, event, sample, machine);
> case PERF_RECORD_CALLCHAIN_DEFERRED:
> dump_deferred_callchain(evsel, event, sample);
> - return tool->callchain_deferred(tool, event, sample, evsel, machine);
> + return evlist__deliver_deferred_samples(evlist, tool, event,
> + sample, machine);
> default:
> ++evlist->stats.nr_unknown_events;
> return -1;
> diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
> index e77f0e2ecc1f79db..27ba5849c74a2e7d 100644
> --- a/tools/perf/util/tool.c
> +++ b/tools/perf/util/tool.c
> @@ -266,6 +266,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
> tool->cgroup_events = false;
> tool->no_warn = false;
> tool->show_feat_hdr = SHOW_FEAT_NO_HEADER;
> + tool->merge_deferred_callchains = true;
>
> tool->sample = process_event_sample_stub;
> tool->mmap = process_event_stub;
> @@ -448,6 +449,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
> tool->tool.cgroup_events = delegate->cgroup_events;
> tool->tool.no_warn = delegate->no_warn;
> tool->tool.show_feat_hdr = delegate->show_feat_hdr;
> + tool->tool.merge_deferred_callchains = delegate->merge_deferred_callchains;
>
> tool->tool.sample = delegate_sample;
> tool->tool.read = delegate_read;
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 9b9f0a8cbf3de4b5..e96b69d25a5b737d 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -90,6 +90,7 @@ struct perf_tool {
> bool cgroup_events;
> bool no_warn;
> bool dont_split_sample_group;
> + bool merge_deferred_callchains;
> enum show_feature_header show_feat_hdr;
> };
>
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains
2025-11-20 2:10 ` [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
@ 2025-11-20 5:29 ` Ian Rogers
2025-11-20 7:21 ` Ian Rogers
0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2025-11-20 5:29 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
Mathieu Desnoyers, linux-trace-kernel, bpf
On Wed, Nov 19, 2025 at 6:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> It's possible that some kernel samples don't have matching deferred
> callchain records when the profiling session was ended before the
> threads came back to userspace. Let's flush the samples before
> finish the session.
>
> Also 32-bit systems can see partial mmap for the data. In that case,
> deferred samples won't point to the correct data once the mapping moves
> to the next portion of the file. Copy the original sample before it
> unmaps the current data.
I think it is simpler to always copy. We may have events from
synthesis, inject, .. and not the reader. Relying on callers to know
that someone made a copy of the event and to make a defensive copy on
their behalf just feels error prone.
In the python session API I need to deal with the lifetime of events.
Currently the events are copied:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/python.c?h=perf-tools-next#n507
and I'm doing this for session tool callbacks:
https://lore.kernel.org/lkml/20251029053413.355154-12-irogers@google.com/
I think it can be made lazier by knowing the tool callback can assume
the event and sample are valid. We can delay the copying of the
event/sample for if the pyevent has a reference count >1 and we're
returning out of the tool callback. Doing some kind of global
knowledge in the reader for maintaining the correctness of memory, I'm
just not clear on how to make it always work.
Thanks,
Ian
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/session.c | 98 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 98 insertions(+)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 2e777fd1bcf6707b..b781e01ddcb4876b 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1288,8 +1288,13 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
> struct deferred_event {
> struct list_head list;
> union perf_event *event;
> + bool allocated;
> };
>
> +/*
> + * This is called when a deferred callchain record comes up. Find all matching
> + * samples, merge the callchains and process them.
> + */
> static int evlist__deliver_deferred_samples(struct evlist *evlist,
> const struct perf_tool *tool,
> union perf_event *event,
> @@ -1331,6 +1336,86 @@ static int evlist__deliver_deferred_samples(struct evlist *evlist,
> free(orig_sample.callchain);
>
> list_del(&de->list);
> + if (de->allocated)
> + free(de->event);
> + free(de);
> +
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> +/*
> + * This is called when the backing mmap is about to go away. It needs to save
> + * the original sample data until it finds the matching deferred callchains.
> + */
> +static void evlist__copy_deferred_samples(struct evlist *evlist,
> + const struct perf_tool *tool,
> + struct machine *machine)
> +{
> + struct deferred_event *de, *tmp;
> + struct evsel *evsel;
> + int ret = 0;
> +
> + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> + struct perf_sample sample;
> + size_t sz = de->event->header.size;
> + void *buf;
> +
> + if (de->allocated)
> + continue;
> +
> + buf = malloc(sz);
> + if (buf) {
> + memcpy(buf, de->event, sz);
> + de->event = buf;
> + de->allocated = true;
> + continue;
> + }
> +
> + /* The allocation failed, flush the sample now */
> + ret = evlist__parse_sample(evlist, de->event, &sample);
> + if (ret == 0) {
> + evsel = evlist__id2evsel(evlist, sample.id);
> + evlist__deliver_sample(evlist, tool, de->event,
> + &sample, evsel, machine);
> + }
> +
> + list_del(&de->list);
> + BUG_ON(de->allocated);
> + free(de);
> + }
> +}
> +
> +/*
> + * This is called at the end of the data processing for the session. Flush the
> + * remaining samples as there's no hope for matching deferred callchains.
> + */
> +static int evlist__flush_deferred_samples(struct evlist *evlist,
> + const struct perf_tool *tool,
> + struct machine *machine)
> +{
> + struct deferred_event *de, *tmp;
> + struct evsel *evsel;
> + int ret = 0;
> +
> + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> + struct perf_sample sample;
> +
> + ret = evlist__parse_sample(evlist, de->event, &sample);
> + if (ret < 0) {
> + pr_err("failed to parse original sample\n");
> + break;
> + }
> +
> + evsel = evlist__id2evsel(evlist, sample.id);
> + ret = evlist__deliver_sample(evlist, tool, de->event,
> + &sample, evsel, machine);
> +
> + list_del(&de->list);
> + if (de->allocated)
> + free(de->event);
> free(de);
>
> if (ret)
> @@ -1374,6 +1459,7 @@ static int machines__deliver_event(struct machines *machines,
> return -ENOMEM;
>
> de->event = event;
> + de->allocated = false;
> list_add_tail(&de->list, &evlist->deferred_samples);
> return 0;
> }
> @@ -2218,6 +2304,8 @@ reader__mmap(struct reader *rd, struct perf_session *session)
> }
>
> if (mmaps[rd->mmap_idx]) {
> + evlist__copy_deferred_samples(session->evlist, session->tool,
> + &session->machines.host);
> munmap(mmaps[rd->mmap_idx], rd->mmap_size);
> mmaps[rd->mmap_idx] = NULL;
> }
> @@ -2372,6 +2460,11 @@ static int __perf_session__process_events(struct perf_session *session)
> if (err)
> goto out_err;
> err = auxtrace__flush_events(session, tool);
> + if (err)
> + goto out_err;
> + err = evlist__flush_deferred_samples(session->evlist,
> + session->tool,
> + &session->machines.host);
> if (err)
> goto out_err;
> err = perf_session__flush_thread_stacks(session);
> @@ -2494,6 +2587,11 @@ static int __perf_session__process_dir_events(struct perf_session *session)
> if (ret)
> goto out_err;
>
> + ret = evlist__flush_deferred_samples(session->evlist, tool,
> + &session->machines.host);
> + if (ret)
> + goto out_err;
> +
> ret = perf_session__flush_thread_stacks(session);
> out_err:
> ui_progress__finish();
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains
2025-11-20 5:29 ` Ian Rogers
@ 2025-11-20 7:21 ` Ian Rogers
0 siblings, 0 replies; 11+ messages in thread
From: Ian Rogers @ 2025-11-20 7:21 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
Mathieu Desnoyers, linux-trace-kernel, bpf
On Wed, Nov 19, 2025 at 9:29 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Nov 19, 2025 at 6:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > It's possible that some kernel samples don't have matching deferred
> > callchain records when the profiling session was ended before the
> > threads came back to userspace. Let's flush the samples before
> > finish the session.
> >
> > Also 32-bit systems can see partial mmap for the data. In that case,
> > deferred samples won't point to the correct data once the mapping moves
> > to the next portion of the file. Copy the original sample before it
> > unmaps the current data.
>
> I think it is simpler to always copy. We may have events from
> synthesis, inject, .. and not the reader. Relying on callers to know
> that someone made a copy of the event and to make a defensive copy on
> their behalf just feels error prone.
>
> In the python session API I need to deal with the lifetime of events.
> Currently the events are copied:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/python.c?h=perf-tools-next#n507
> and I'm doing this for session tool callbacks:
> https://lore.kernel.org/lkml/20251029053413.355154-12-irogers@google.com/
> I think it can be made lazier by knowing the tool callback can assume
> the event and sample are valid. We can delay the copying of the
> event/sample for if the pyevent has a reference count >1 and we're
> returning out of the tool callback. Doing some kind of global
> knowledge in the reader for maintaining the correctness of memory, I'm
> just not clear on how to make it always work.
I believe we always reuse the memory for the event, per event, in pipe mode:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/session.c?h=perf-tools-next#n1868
so a lazy copy will be broken for the pipe mode case.
Thanks,
Ian
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/util/session.c | 98 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 98 insertions(+)
> >
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 2e777fd1bcf6707b..b781e01ddcb4876b 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1288,8 +1288,13 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
> > struct deferred_event {
> > struct list_head list;
> > union perf_event *event;
> > + bool allocated;
> > };
> >
> > +/*
> > + * This is called when a deferred callchain record comes up. Find all matching
> > + * samples, merge the callchains and process them.
> > + */
> > static int evlist__deliver_deferred_samples(struct evlist *evlist,
> > const struct perf_tool *tool,
> > union perf_event *event,
> > @@ -1331,6 +1336,86 @@ static int evlist__deliver_deferred_samples(struct evlist *evlist,
> > free(orig_sample.callchain);
> >
> > list_del(&de->list);
> > + if (de->allocated)
> > + free(de->event);
> > + free(de);
> > +
> > + if (ret)
> > + break;
> > + }
> > + return ret;
> > +}
> > +
> > +/*
> > + * This is called when the backing mmap is about to go away. It needs to save
> > + * the original sample data until it finds the matching deferred callchains.
> > + */
> > +static void evlist__copy_deferred_samples(struct evlist *evlist,
> > + const struct perf_tool *tool,
> > + struct machine *machine)
> > +{
> > + struct deferred_event *de, *tmp;
> > + struct evsel *evsel;
> > + int ret = 0;
> > +
> > + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> > + struct perf_sample sample;
> > + size_t sz = de->event->header.size;
> > + void *buf;
> > +
> > + if (de->allocated)
> > + continue;
> > +
> > + buf = malloc(sz);
> > + if (buf) {
> > + memcpy(buf, de->event, sz);
> > + de->event = buf;
> > + de->allocated = true;
> > + continue;
> > + }
> > +
> > + /* The allocation failed, flush the sample now */
> > + ret = evlist__parse_sample(evlist, de->event, &sample);
> > + if (ret == 0) {
> > + evsel = evlist__id2evsel(evlist, sample.id);
> > + evlist__deliver_sample(evlist, tool, de->event,
> > + &sample, evsel, machine);
> > + }
> > +
> > + list_del(&de->list);
> > + BUG_ON(de->allocated);
> > + free(de);
> > + }
> > +}
> > +
> > +/*
> > + * This is called at the end of the data processing for the session. Flush the
> > + * remaining samples as there's no hope for matching deferred callchains.
> > + */
> > +static int evlist__flush_deferred_samples(struct evlist *evlist,
> > + const struct perf_tool *tool,
> > + struct machine *machine)
> > +{
> > + struct deferred_event *de, *tmp;
> > + struct evsel *evsel;
> > + int ret = 0;
> > +
> > + list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> > + struct perf_sample sample;
> > +
> > + ret = evlist__parse_sample(evlist, de->event, &sample);
> > + if (ret < 0) {
> > + pr_err("failed to parse original sample\n");
> > + break;
> > + }
> > +
> > + evsel = evlist__id2evsel(evlist, sample.id);
> > + ret = evlist__deliver_sample(evlist, tool, de->event,
> > + &sample, evsel, machine);
> > +
> > + list_del(&de->list);
> > + if (de->allocated)
> > + free(de->event);
> > free(de);
> >
> > if (ret)
> > @@ -1374,6 +1459,7 @@ static int machines__deliver_event(struct machines *machines,
> > return -ENOMEM;
> >
> > de->event = event;
> > + de->allocated = false;
> > list_add_tail(&de->list, &evlist->deferred_samples);
> > return 0;
> > }
> > @@ -2218,6 +2304,8 @@ reader__mmap(struct reader *rd, struct perf_session *session)
> > }
> >
> > if (mmaps[rd->mmap_idx]) {
> > + evlist__copy_deferred_samples(session->evlist, session->tool,
> > + &session->machines.host);
> > munmap(mmaps[rd->mmap_idx], rd->mmap_size);
> > mmaps[rd->mmap_idx] = NULL;
> > }
> > @@ -2372,6 +2460,11 @@ static int __perf_session__process_events(struct perf_session *session)
> > if (err)
> > goto out_err;
> > err = auxtrace__flush_events(session, tool);
> > + if (err)
> > + goto out_err;
> > + err = evlist__flush_deferred_samples(session->evlist,
> > + session->tool,
> > + &session->machines.host);
> > if (err)
> > goto out_err;
> > err = perf_session__flush_thread_stacks(session);
> > @@ -2494,6 +2587,11 @@ static int __perf_session__process_dir_events(struct perf_session *session)
> > if (ret)
> > goto out_err;
> >
> > + ret = evlist__flush_deferred_samples(session->evlist, tool,
> > + &session->machines.host);
> > + if (ret)
> > + goto out_err;
> > +
> > ret = perf_session__flush_thread_stacks(session);
> > out_err:
> > ui_progress__finish();
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v5 5/6] perf tools: Merge deferred user callchains
2025-11-20 5:13 ` Ian Rogers
@ 2025-11-20 21:44 ` Namhyung Kim
0 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2025-11-20 21:44 UTC (permalink / raw)
To: Ian Rogers
Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
Mathieu Desnoyers, linux-trace-kernel, bpf
On Wed, Nov 19, 2025 at 09:13:50PM -0800, Ian Rogers wrote:
> On Wed, Nov 19, 2025 at 6:11 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Save samples with deferred callchains in a separate list and deliver
> > them after merging the user callchains. If users don't want to merge
> > they can set tool->merge_deferred_callchains to false to prevent the
> > behavior.
> >
> > With previous result, now perf script will show the merged callchains.
> >
> > $ perf script
> > ...
> > pwd 2312 121.163435: 249113 cpu/cycles/P:
> > ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
> > ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
> > ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
> > ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
> > ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
> > ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
> > ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> > 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > ...
> >
> > The old output can be get using --no-merge-callchain option.
> > Also perf report can get the user callchain entry at the end.
> >
> > $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
> > # symbol: __build_id_parse.isra.0
> > 8.40% pwd [kernel.kallsyms]
> > |
> > ---__build_id_parse.isra.0
> > perf_event_mmap
> > mprotect_fixup
> > do_mprotect_pkey
> > __x64_sys_mprotect
> > do_syscall_64
> > entry_SYSCALL_64_after_hwframe
> > mprotect
> > _dl_sysdep_start
> > _dl_start_user
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/Documentation/perf-script.txt | 5 ++
> > tools/perf/builtin-inject.c | 1 +
> > tools/perf/builtin-report.c | 1 +
> > tools/perf/builtin-script.c | 4 ++
> > tools/perf/util/callchain.c | 29 ++++++++++
> > tools/perf/util/callchain.h | 3 ++
> > tools/perf/util/evlist.c | 1 +
> > tools/perf/util/evlist.h | 2 +
> > tools/perf/util/session.c | 67 +++++++++++++++++++++++-
> > tools/perf/util/tool.c | 2 +
> > tools/perf/util/tool.h | 1 +
> > 11 files changed, 115 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> > index 28bec7e78bc858ba..03d1129606328d6d 100644
> > --- a/tools/perf/Documentation/perf-script.txt
> > +++ b/tools/perf/Documentation/perf-script.txt
> > @@ -527,6 +527,11 @@ include::itrace.txt[]
> > The known limitations include exception handing such as
> > setjmp/longjmp will have calls/returns not match.
> >
> > +--merge-callchains::
> > + Enable merging deferred user callchains if available. This is the
> > + default behavior. If you want to see separate CALLCHAIN_DEFERRED
> > + records for some reason, use --no-merge-callchains explicitly.
> > +
> > :GMEXAMPLECMD: script
> > :GMEXAMPLESUBCMD:
> > include::guest-files.txt[]
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index bd9245d2dd41aa48..51d2721b6db9dccb 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
> > inject.tool.auxtrace = perf_event__repipe_auxtrace;
> > inject.tool.bpf_metadata = perf_event__repipe_op2_synth;
> > inject.tool.dont_split_sample_group = true;
> > + inject.tool.merge_deferred_callchains = false;
> > inject.session = __perf_session__new(&data, &inject.tool,
> > /*trace_event_repipe=*/inject.output.is_pipe,
> > /*host_env=*/NULL);
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index 2bc269f5fcef8023..add6b1c2aaf04270 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
> > report.tool.event_update = perf_event__process_event_update;
> > report.tool.feature = process_feature_event;
> > report.tool.ordering_requires_timestamps = true;
> > + report.tool.merge_deferred_callchains = !dump_trace;
> >
> > session = perf_session__new(&data, &report.tool);
> > if (IS_ERR(session)) {
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index 85b42205a71b3993..62e43d3c5ad731a0 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
> > bool header_only = false;
> > bool script_started = false;
> > bool unsorted_dump = false;
> > + bool merge_deferred_callchains = true;
> > char *rec_script_path = NULL;
> > char *rep_script_path = NULL;
> > struct perf_session *session;
> > @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
> > "Guest code can be found in hypervisor process"),
> > OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
> > "Enable LBR callgraph stitching approach"),
> > + OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
> > + "Enable merge deferred user callchains"),
> > OPTS_EVSWITCH(&script.evswitch),
> > OPT_END()
> > };
> > @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
> > script.tool.throttle = process_throttle_event;
> > script.tool.unthrottle = process_throttle_event;
> > script.tool.ordering_requires_timestamps = true;
> > + script.tool.merge_deferred_callchains = merge_deferred_callchains;
> > session = perf_session__new(&data, &script.tool);
> > if (IS_ERR(session))
> > return PTR_ERR(session);
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 2884187ccbbecfdc..71dc5a070065dd2a 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> > }
> > return 0;
> > }
> > +
> > +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> > + struct perf_sample *sample_callchain)
> > +{
> > + u64 nr_orig = sample_orig->callchain->nr - 1;
> > + u64 nr_deferred = sample_callchain->callchain->nr;
> > + struct ip_callchain *callchain;
> > +
> > + if (sample_orig->callchain->nr < 2) {
> > + sample_orig->deferred_callchain = false;
> > + return -EINVAL;
> > + }
> > +
> > + callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
> > + if (callchain == NULL) {
> > + sample_orig->deferred_callchain = false;
> > + return -ENOMEM;
> > + }
> > +
> > + callchain->nr = nr_orig + nr_deferred;
> > + /* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
> > + memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
> > + /* copy deferred user callchains */
> > + memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
> > + nr_deferred * sizeof(u64));
> > +
> > + sample_orig->callchain = callchain;
> > + return 0;
> > +}
> > diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> > index d5ae4fbb7ce5fa44..2a52af8c80ace33c 100644
> > --- a/tools/perf/util/callchain.h
> > +++ b/tools/perf/util/callchain.h
> > @@ -318,4 +318,7 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> > struct perf_sample *sample, int max_stack,
> > bool symbols, callchain_iter_fn cb, void *data);
> >
> > +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> > + struct perf_sample *sample_callchain);
> > +
> > #endif /* __PERF_CALLCHAIN_H */
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index e8217efdda5323c6..03674d2cbd015e4f 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -85,6 +85,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
> > evlist->ctl_fd.pos = -1;
> > evlist->nr_br_cntr = -1;
> > metricgroup__rblist_init(&evlist->metric_events);
> > + INIT_LIST_HEAD(&evlist->deferred_samples);
> > }
> >
> > struct evlist *evlist__new(void)
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index 5e71e3dc60423079..911834ae7c2a6f76 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -92,6 +92,8 @@ struct evlist {
> > * of struct metric_expr.
> > */
> > struct rblist metric_events;
> > + /* samples with deferred_callchain would wait here. */
> > + struct list_head deferred_samples;
> > };
> >
> > struct evsel_str_handler {
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 361e15c1f26a96d0..2e777fd1bcf6707b 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1285,6 +1285,60 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
> > per_thread);
> > }
> >
> > +struct deferred_event {
> > + struct list_head list;
> > + union perf_event *event;
>
> Is this the old version of the patch? No comment and it seems the
> event's memory isn't copied. I'm worried as we have events in stack
> allocated memory such as:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/session.c?h=perf-tools-next#n1618
> or copies:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/builtin-inject.c?h=perf-tools-next#n336
> or just because the ring buffer overwrites itself. My belief is that
> we don't hold events, and the associated parsed sample, longer than
> the tool callback because it'll be wrong/corrupt after that. Here the
> deferred callchain events are all being held longer than a single tool
> event callback.
I added the conditional copy in the next patch, but as you said, it'd be
safer to copy always. The concern is the performance of perf report
which is already slow and I don't want to make it worse. But let's
start with the simple, safe version and optimize it later.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-11-20 21:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-20 2:10 [PATCHSET v5 0/6] perf tools: Add deferred callchain support Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 5/6] perf tools: Merge deferred user callchains Namhyung Kim
2025-11-20 5:13 ` Ian Rogers
2025-11-20 21:44 ` Namhyung Kim
2025-11-20 2:10 ` [PATCH v5 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
2025-11-20 5:29 ` Ian Rogers
2025-11-20 7:21 ` Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).