linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v6 0/6] perf tools: Add deferred callchain support
@ 2025-11-20 23:47 Namhyung Kim
  2025-11-20 23:47 ` [PATCH v6 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Hello,

This is a new version of deferred callchain support as the kernel part
is merged to the tip tree.  Actually this is based on Steve's work (v16).

  https://lore.kernel.org/r/20250908175319.841517121@kernel.org

v6 changes)

* always copy the events  (Ian)
* add flush in the pipe mode

v5: https://lore.kernel.org/r/20251120021046.94490-1-namhyung@kernel.org/

* update delegate tools  (Ian)
* copy and flush remaining samples  (Ian)
* add Ian's Reviewed-by tags

v4: https://lore.kernel.org/r/20251115234106.348571-1-namhyung@kernel.org

* add --call-graph fp,defer option   (Ian, Steve)
* add more comment on the cookie  (Ian)
* display cookie part in the deferred callchain  (Ian)

v3: https://lore.kernel.org/r/20251114070018.160330-1-namhyung@kernel.org

* handle new attr.defer_output to generate deferred callchains
* fix crash when cookies don't match  (Steven)
* disable merging for perf inject
* fix missing feature detection bug
* symbolize merged callchains properly

Here's an example session.

  $ perf record --call-graph fp,defer  pwd
  /home/namhyung/project/linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.010 MB perf.data (29 samples) ]

  $ perf evlist -v
  cpu/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES),
  { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CALLCHAIN|PERIOD,
  read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1,
  task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, build_id: 1,
  defer_callchain: 1, defer_output: 1

  $ perf script
  ...
  pwd    2312   121.163435:     249113 cpu/cycles/P:
          ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
          ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
          ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
          ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
          ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
          ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
          ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
              7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
  ...

  $ perf script --no-merge-callchains
  ...
  pwd    2312   121.163435:     249113 cpu/cycles/P:
          ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
          ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
          ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
          ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
          ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
          ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
          ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
                 b00000006 (cookie) ([unknown])

  pwd    2312   121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
              7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
  ...

The code is available at 'perf/defer-callchain-v6' branch in

  git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (6):
  tools headers UAPI: Sync linux/perf_event.h for deferred callchains
  perf tools: Minimal DEFERRED_CALLCHAIN support
  perf record: Add --call-graph fp,defer option for deferred callchains
  perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
  perf tools: Merge deferred user callchains
  perf tools: Flush remaining samples w/o deferred callchains

 tools/include/uapi/linux/perf_event.h     |  21 +++-
 tools/lib/perf/include/perf/event.h       |  13 ++
 tools/perf/Documentation/perf-config.txt  |   3 +
 tools/perf/Documentation/perf-record.txt  |   4 +
 tools/perf/Documentation/perf-script.txt  |   5 +
 tools/perf/builtin-inject.c               |   1 +
 tools/perf/builtin-report.c               |   1 +
 tools/perf/builtin-script.c               |  93 ++++++++++++++
 tools/perf/util/callchain.c               |  45 ++++++-
 tools/perf/util/callchain.h               |   4 +
 tools/perf/util/event.c                   |   1 +
 tools/perf/util/evlist.c                  |   1 +
 tools/perf/util/evlist.h                  |   2 +
 tools/perf/util/evsel.c                   |  50 +++++++-
 tools/perf/util/evsel.h                   |   1 +
 tools/perf/util/evsel_fprintf.c           |   5 +-
 tools/perf/util/machine.c                 |   1 +
 tools/perf/util/perf_event_attr_fprintf.c |   2 +
 tools/perf/util/sample.h                  |   2 +
 tools/perf/util/session.c                 | 147 ++++++++++++++++++++++
 tools/perf/util/tool.c                    |   5 +
 tools/perf/util/tool.h                    |   4 +-
 22 files changed, 403 insertions(+), 8 deletions(-)

-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
@ 2025-11-20 23:47 ` Namhyung Kim
  2025-11-20 23:48 ` [PATCH v6 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

It needs to sync with the kernel to support user space changes for the
deferred callchains.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/include/uapi/linux/perf_event.h | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 78a362b8002776e5..d292f96bc06f86bc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -463,7 +463,9 @@ struct perf_event_attr {
 				inherit_thread :  1, /* children only inherit if cloned with CLONE_THREAD */
 				remove_on_exec :  1, /* event is removed from task on exec */
 				sigtrap        :  1, /* send synchronous SIGTRAP on event */
-				__reserved_1   : 26;
+				defer_callchain:  1, /* request PERF_RECORD_CALLCHAIN_DEFERRED records */
+				defer_output   :  1, /* output PERF_RECORD_CALLCHAIN_DEFERRED records */
+				__reserved_1   : 24;
 
 	union {
 		__u32		wakeup_events;	  /* wake up every n events */
@@ -1239,6 +1241,22 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_AUX_OUTPUT_HW_ID		= 21,
 
+	/*
+	 * This user callchain capture was deferred until shortly before
+	 * returning to user space.  Previous samples would have kernel
+	 * callchains only and they need to be stitched with this to make full
+	 * callchains.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				cookie;
+	 *	u64				nr;
+	 *	u64				ips[nr];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_CALLCHAIN_DEFERRED		= 22,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
@@ -1269,6 +1287,7 @@ enum perf_callchain_context {
 	PERF_CONTEXT_HV				= (__u64)-32,
 	PERF_CONTEXT_KERNEL			= (__u64)-128,
 	PERF_CONTEXT_USER			= (__u64)-512,
+	PERF_CONTEXT_USER_DEFERRED		= (__u64)-640,
 
 	PERF_CONTEXT_GUEST			= (__u64)-2048,
 	PERF_CONTEXT_GUEST_KERNEL		= (__u64)-2176,
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
  2025-11-20 23:47 ` [PATCH v6 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
@ 2025-11-20 23:48 ` Namhyung Kim
  2025-11-20 23:48 ` [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Add a new event type for deferred callchains and a new callback for the
struct perf_tool.  For now it doesn't actually handle the deferred
callchains but it just marks the sample if it has the PERF_CONTEXT_
USER_DEFFERED in the callchain array.

At least, perf report can dump the raw data with this change.  Actually
this requires the next commit to enable attr.defer_callchain, but if you
already have a data file, it'll show the following result.

  $ perf report -D
  ...
  0x2158@perf.data [0x40]: event: 22
  .
  . ... raw event: size 64 bytes
  .  0000:  16 00 00 00 02 00 40 00 06 00 00 00 0b 00 00 00  ......@.........
  .  0010:  03 00 00 00 00 00 00 00 a7 7f 33 fe 18 7f 00 00  ..........3.....
  .  0020:  0f 0e 33 fe 18 7f 00 00 48 14 33 fe 18 7f 00 00  ..3.....H.3.....
  .  0030:  08 09 00 00 08 09 00 00 e6 7a e7 35 1c 00 00 00  .........z.5....

  121163447014 0x2158 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2312/2312: 0xb00000006
  ... FP chain: nr:3
  .....  0: 00007f18fe337fa7
  .....  1: 00007f18fe330e0f
  .....  2: 00007f18fe331448
  : unhandled!

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/lib/perf/include/perf/event.h       | 13 ++++++++++
 tools/perf/util/event.c                   |  1 +
 tools/perf/util/evsel.c                   | 31 +++++++++++++++++++++--
 tools/perf/util/machine.c                 |  1 +
 tools/perf/util/perf_event_attr_fprintf.c |  2 ++
 tools/perf/util/sample.h                  |  2 ++
 tools/perf/util/session.c                 | 20 +++++++++++++++
 tools/perf/util/tool.c                    |  3 +++
 tools/perf/util/tool.h                    |  3 ++-
 9 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index aa1e91c97a226e1a..43a8cb04994fa033 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -151,6 +151,18 @@ struct perf_record_switch {
 	__u32			 next_prev_tid;
 };
 
+struct perf_record_callchain_deferred {
+	struct perf_event_header header;
+	/*
+	 * This is to match kernel and (deferred) user stacks together.
+	 * The kernel part will be in the sample callchain array after
+	 * the PERF_CONTEXT_USER_DEFERRED entry.
+	 */
+	__u64			 cookie;
+	__u64			 nr;
+	__u64			 ips[];
+};
+
 struct perf_record_header_attr {
 	struct perf_event_header header;
 	struct perf_event_attr	 attr;
@@ -523,6 +535,7 @@ union perf_event {
 	struct perf_record_read			read;
 	struct perf_record_throttle		throttle;
 	struct perf_record_sample		sample;
+	struct perf_record_callchain_deferred	callchain_deferred;
 	struct perf_record_bpf_event		bpf;
 	struct perf_record_ksymbol		ksymbol;
 	struct perf_record_text_poke_event	text_poke;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index fcf44149feb20c35..4c92cc1a952c1d9f 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -61,6 +61,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_CGROUP]			= "CGROUP",
 	[PERF_RECORD_TEXT_POKE]			= "TEXT_POKE",
 	[PERF_RECORD_AUX_OUTPUT_HW_ID]		= "AUX_OUTPUT_HW_ID",
+	[PERF_RECORD_CALLCHAIN_DEFERRED]	= "CALLCHAIN_DEFERRED",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index aee42666e882daab..f1a311637694ac0a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3089,6 +3089,20 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 	data->data_src = PERF_MEM_DATA_SRC_NONE;
 	data->vcpu = -1;
 
+	if (event->header.type == PERF_RECORD_CALLCHAIN_DEFERRED) {
+		const u64 max_callchain_nr = UINT64_MAX / sizeof(u64);
+
+		data->callchain = (struct ip_callchain *)&event->callchain_deferred.nr;
+		if (data->callchain->nr > max_callchain_nr)
+			return -EFAULT;
+
+		data->deferred_cookie = event->callchain_deferred.cookie;
+
+		if (evsel->core.attr.sample_id_all)
+			perf_evsel__parse_id_sample(evsel, event, data);
+		return 0;
+	}
+
 	if (event->header.type != PERF_RECORD_SAMPLE) {
 		if (!evsel->core.attr.sample_id_all)
 			return 0;
@@ -3213,12 +3227,25 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 	if (type & PERF_SAMPLE_CALLCHAIN) {
 		const u64 max_callchain_nr = UINT64_MAX / sizeof(u64);
+		u64 callchain_nr;
 
 		OVERFLOW_CHECK_u64(array);
 		data->callchain = (struct ip_callchain *)array++;
-		if (data->callchain->nr > max_callchain_nr)
+		callchain_nr = data->callchain->nr;
+		if (callchain_nr > max_callchain_nr)
 			return -EFAULT;
-		sz = data->callchain->nr * sizeof(u64);
+		sz = callchain_nr * sizeof(u64);
+		/*
+		 * Save the cookie for the deferred user callchain.  The last 2
+		 * entries in the callchain should be the context marker and the
+		 * cookie.  The cookie will be used to match PERF_RECORD_
+		 * CALLCHAIN_DEFERRED later.
+		 */
+		if (evsel->core.attr.defer_callchain && callchain_nr >= 2 &&
+		    data->callchain->ips[callchain_nr - 2] == PERF_CONTEXT_USER_DEFERRED) {
+			data->deferred_cookie = data->callchain->ips[callchain_nr - 1];
+			data->deferred_callchain = true;
+		}
 		OVERFLOW_CHECK(array, sz, max_size);
 		array = (void *)array + sz;
 	}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b5dd42588c916d91..841b711d970e9457 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2124,6 +2124,7 @@ static int add_callchain_ip(struct thread *thread,
 				*cpumode = PERF_RECORD_MISC_KERNEL;
 				break;
 			case PERF_CONTEXT_USER:
+			case PERF_CONTEXT_USER_DEFERRED:
 				*cpumode = PERF_RECORD_MISC_USER;
 				break;
 			default:
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 66b666d9ce649dd7..741c3d657a8b6ae7 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -343,6 +343,8 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(inherit_thread, p_unsigned);
 	PRINT_ATTRf(remove_on_exec, p_unsigned);
 	PRINT_ATTRf(sigtrap, p_unsigned);
+	PRINT_ATTRf(defer_callchain, p_unsigned);
+	PRINT_ATTRf(defer_output, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned, false);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index fae834144ef42105..a8307b20a9ea8066 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -107,6 +107,8 @@ struct perf_sample {
 	/** @weight3: On x86 holds retire_lat, on powerpc holds p_stage_cyc. */
 	u16 weight3;
 	bool no_hw_idx;		/* No hw_idx collected in branch_stack */
+	bool deferred_callchain;	/* Has deferred user callchains */
+	u64 deferred_cookie;
 	char insn[MAX_INSN];
 	void *raw_data;
 	struct ip_callchain *callchain;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4b0236b2df2913e1..361e15c1f26a96d0 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -720,6 +720,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_CGROUP]		  = perf_event__cgroup_swap,
 	[PERF_RECORD_TEXT_POKE]		  = perf_event__text_poke_swap,
 	[PERF_RECORD_AUX_OUTPUT_HW_ID]	  = perf_event__all64_swap,
+	[PERF_RECORD_CALLCHAIN_DEFERRED]  = perf_event__all64_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -854,6 +855,9 @@ static void callchain__printf(struct evsel *evsel,
 	for (i = 0; i < callchain->nr; i++)
 		printf("..... %2d: %016" PRIx64 "\n",
 		       i, callchain->ips[i]);
+
+	if (sample->deferred_callchain)
+		printf("...... (deferred)\n");
 }
 
 static void branch_stack__printf(struct perf_sample *sample,
@@ -1123,6 +1127,19 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
 		sample_read__printf(sample, evsel->core.attr.read_format);
 }
 
+static void dump_deferred_callchain(struct evsel *evsel, union perf_event *event,
+				    struct perf_sample *sample)
+{
+	if (!dump_trace)
+		return;
+
+	printf("(IP, 0x%x): %d/%d: %#" PRIx64 "\n",
+	       event->header.misc, sample->pid, sample->tid, sample->deferred_cookie);
+
+	if (evsel__has_callchain(evsel))
+		callchain__printf(evsel, sample);
+}
+
 static void dump_read(struct evsel *evsel, union perf_event *event)
 {
 	struct perf_record_read *read_event = &event->read;
@@ -1353,6 +1370,9 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->text_poke(tool, event, sample, machine);
 	case PERF_RECORD_AUX_OUTPUT_HW_ID:
 		return tool->aux_output_hw_id(tool, event, sample, machine);
+	case PERF_RECORD_CALLCHAIN_DEFERRED:
+		dump_deferred_callchain(evsel, event, sample);
+		return tool->callchain_deferred(tool, event, sample, evsel, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
index 22a8a4ffe05f778e..e77f0e2ecc1f79db 100644
--- a/tools/perf/util/tool.c
+++ b/tools/perf/util/tool.c
@@ -287,6 +287,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
 	tool->read = process_event_sample_stub;
 	tool->throttle = process_event_stub;
 	tool->unthrottle = process_event_stub;
+	tool->callchain_deferred = process_event_sample_stub;
 	tool->attr = process_event_synth_attr_stub;
 	tool->event_update = process_event_synth_event_update_stub;
 	tool->tracing_data = process_event_synth_tracing_data_stub;
@@ -335,6 +336,7 @@ bool perf_tool__compressed_is_stub(const struct perf_tool *tool)
 	}
 CREATE_DELEGATE_SAMPLE(read);
 CREATE_DELEGATE_SAMPLE(sample);
+CREATE_DELEGATE_SAMPLE(callchain_deferred);
 
 #define CREATE_DELEGATE_ATTR(name)					\
 	static int delegate_ ## name(const struct perf_tool *tool,	\
@@ -468,6 +470,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
 	tool->tool.ksymbol = delegate_ksymbol;
 	tool->tool.bpf = delegate_bpf;
 	tool->tool.text_poke = delegate_text_poke;
+	tool->tool.callchain_deferred = delegate_callchain_deferred;
 
 	tool->tool.attr = delegate_attr;
 	tool->tool.event_update = delegate_event_update;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 88337cee1e3e2be3..9b9f0a8cbf3de4b5 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -44,7 +44,8 @@ enum show_feature_header {
 
 struct perf_tool {
 	event_sample	sample,
-			read;
+			read,
+			callchain_deferred;
 	event_op	mmap,
 			mmap2,
 			comm,
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
  2025-11-20 23:47 ` [PATCH v6 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
  2025-11-20 23:48 ` [PATCH v6 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
@ 2025-11-20 23:48 ` Namhyung Kim
  2025-11-21  6:26   ` Thomas Richter
  2025-12-03  5:49   ` Namhyung Kim
  2025-11-20 23:48 ` [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Add a new callchain record mode option for deferred callchains.  For now
it only works with FP (frame-pointer) mode.

And add the missing feature detection logic to clear the flag on old
kernels.

  $ perf record --call-graph fp,defer -vv true
  ...
  ------------------------------------------------------------
  perf_event_attr:
    type                             0 (PERF_TYPE_HARDWARE)
    size                             136
    config                           0 (PERF_COUNT_HW_CPU_CYCLES)
    { sample_period, sample_freq }   4000
    sample_type                      IP|TID|TIME|CALLCHAIN|PERIOD
    read_format                      ID|LOST
    disabled                         1
    inherit                          1
    mmap                             1
    comm                             1
    freq                             1
    enable_on_exec                   1
    task                             1
    sample_id_all                    1
    mmap2                            1
    comm_exec                        1
    ksymbol                          1
    bpf_event                        1
    defer_callchain                  1
    defer_output                     1
  ------------------------------------------------------------
  sys_perf_event_open: pid 162755  cpu 0  group_fd -1  flags 0x8
  sys_perf_event_open failed, error -22
  switching off deferred callchain support

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-config.txt |  3 +++
 tools/perf/Documentation/perf-record.txt |  4 ++++
 tools/perf/util/callchain.c              | 16 +++++++++++++---
 tools/perf/util/callchain.h              |  1 +
 tools/perf/util/evsel.c                  | 19 +++++++++++++++++++
 tools/perf/util/evsel.h                  |  1 +
 6 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
index c6f33565966735fe..642d1c490d9e3bcd 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -452,6 +452,9 @@ Variables
 		kernel space is controlled not by this option but by the
 		kernel config (CONFIG_UNWINDER_*).
 
+		The 'defer' mode can be used with 'fp' mode to enable deferred
+		user callchains (like 'fp,defer').
+
 	call-graph.dump-size::
 		The size of stack to dump in order to do post-unwinding. Default is 8192 (byte).
 		When using dwarf into record-mode, the default size will be used if omitted.
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 067891bd7da6edc8..e8b9aadbbfa50574 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -325,6 +325,10 @@ OPTIONS
 	by default.  User can change the number by passing it after comma
 	like "--call-graph fp,32".
 
+	Also "defer" can be used with "fp" (like "--call-graph fp,defer") to
+	enable deferred user callchain which will collect user-space callchains
+	when the thread returns to the user space.
+
 -q::
 --quiet::
 	Don't print any warnings or messages, useful for scripting.
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index d7b7eef740b9d6ed..2884187ccbbecfdc 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -275,9 +275,13 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
 			if (tok) {
 				unsigned long size;
 
-				size = strtoul(tok, &name, 0);
-				if (size < (unsigned) sysctl__max_stack())
-					param->max_stack = size;
+				if (!strncmp(tok, "defer", sizeof("defer"))) {
+					param->defer = true;
+				} else {
+					size = strtoul(tok, &name, 0);
+					if (size < (unsigned) sysctl__max_stack())
+						param->max_stack = size;
+				}
 			}
 			break;
 
@@ -314,6 +318,12 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
 	} while (0);
 
 	free(buf);
+
+	if (param->defer && param->record_mode != CALLCHAIN_FP) {
+		pr_err("callchain: deferred callchain only works with FP\n");
+		return -EINVAL;
+	}
+
 	return ret;
 }
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 86ed9e4d04f9ee7b..d5ae4fbb7ce5fa44 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -98,6 +98,7 @@ extern bool dwarf_callchain_users;
 
 struct callchain_param {
 	bool			enabled;
+	bool			defer;
 	enum perf_call_graph_mode record_mode;
 	u32			dump_size;
 	enum chain_mode 	mode;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f1a311637694ac0a..887c6ac6c49cc415 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1065,6 +1065,9 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
 		pr_info("Disabling user space callchains for function trace event.\n");
 		attr->exclude_callchain_user = 1;
 	}
+
+	if (param->defer && !attr->exclude_callchain_user)
+		attr->defer_callchain = 1;
 }
 
 void evsel__config_callchain(struct evsel *evsel, struct record_opts *opts,
@@ -1511,6 +1514,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	attr->mmap2    = track && !perf_missing_features.mmap2;
 	attr->comm     = track;
 	attr->build_id = track && opts->build_id;
+	attr->defer_output = track && callchain->defer;
 
 	/*
 	 * ksymbol is tracked separately with text poke because it needs to be
@@ -2199,6 +2203,10 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 static void evsel__disable_missing_features(struct evsel *evsel)
 {
+	if (perf_missing_features.defer_callchain && evsel->core.attr.defer_callchain)
+		evsel->core.attr.defer_callchain = 0;
+	if (perf_missing_features.defer_callchain && evsel->core.attr.defer_output)
+		evsel->core.attr.defer_output = 0;
 	if (perf_missing_features.inherit_sample_read && evsel->core.attr.inherit &&
 	    (evsel->core.attr.sample_type & PERF_SAMPLE_READ))
 		evsel->core.attr.inherit = 0;
@@ -2473,6 +2481,13 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
 
 	/* Please add new feature detection here. */
 
+	attr.defer_callchain = true;
+	if (has_attr_feature(&attr, /*flags=*/0))
+		goto found;
+	perf_missing_features.defer_callchain = true;
+	pr_debug2("switching off deferred callchain support\n");
+	attr.defer_callchain = false;
+
 	attr.inherit = true;
 	attr.sample_type = PERF_SAMPLE_READ | PERF_SAMPLE_TID;
 	if (has_attr_feature(&attr, /*flags=*/0))
@@ -2584,6 +2599,10 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
 	errno = old_errno;
 
 check:
+	if ((evsel->core.attr.defer_callchain || evsel->core.attr.defer_output) &&
+	    perf_missing_features.defer_callchain)
+		return true;
+
 	if (evsel->core.attr.inherit &&
 	    (evsel->core.attr.sample_type & PERF_SAMPLE_READ) &&
 	    perf_missing_features.inherit_sample_read)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3ae4ac8f9a37e009..a08130ff2e47a887 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -221,6 +221,7 @@ struct perf_missing_features {
 	bool branch_counters;
 	bool aux_action;
 	bool inherit_sample_read;
+	bool defer_callchain;
 };
 
 extern struct perf_missing_features perf_missing_features;
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
                   ` (2 preceding siblings ...)
  2025-11-20 23:48 ` [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
@ 2025-11-20 23:48 ` Namhyung Kim
  2025-12-12 12:11   ` Jens Remus
  2025-11-20 23:48 ` [PATCH v6 5/6] perf tools: Merge deferred user callchains Namhyung Kim
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Handle the deferred callchains in the script output.

  $ perf script
  ...
  pwd    2312   121.163435:     249113 cpu/cycles/P:
          ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
          ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
          ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
          ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
          ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
          ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
          ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
                 b00000006 (cookie) ([unknown])

  pwd    2312   121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
              7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-script.c     | 89 +++++++++++++++++++++++++++++++++
 tools/perf/util/evsel_fprintf.c |  5 +-
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 011962e1ee0f6898..85b42205a71b3993 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2706,6 +2706,94 @@ static int process_sample_event(const struct perf_tool *tool,
 	return ret;
 }
 
+static int process_deferred_sample_event(const struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct evsel *evsel,
+					 struct machine *machine)
+{
+	struct perf_script *scr = container_of(tool, struct perf_script, tool);
+	struct perf_event_attr *attr = &evsel->core.attr;
+	struct evsel_script *es = evsel->priv;
+	unsigned int type = output_type(attr->type);
+	struct addr_location al;
+	FILE *fp = es->fp;
+	int ret = 0;
+
+	if (output[type].fields == 0)
+		return 0;
+
+	/* Set thread to NULL to indicate addr_al and al are not initialized */
+	addr_location__init(&al);
+
+	if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
+					  sample->time)) {
+		goto out_put;
+	}
+
+	if (debug_mode) {
+		if (sample->time < last_timestamp) {
+			pr_err("Samples misordered, previous: %" PRIu64
+				" this: %" PRIu64 "\n", last_timestamp,
+				sample->time);
+			nr_unordered++;
+		}
+		last_timestamp = sample->time;
+		goto out_put;
+	}
+
+	if (filter_cpu(sample))
+		goto out_put;
+
+	if (machine__resolve(machine, &al, sample) < 0) {
+		pr_err("problem processing %d event, skipping it.\n",
+		       event->header.type);
+		ret = -1;
+		goto out_put;
+	}
+
+	if (al.filtered)
+		goto out_put;
+
+	if (!show_event(sample, evsel, al.thread, &al, NULL))
+		goto out_put;
+
+	if (evswitch__discard(&scr->evswitch, evsel))
+		goto out_put;
+
+	perf_sample__fprintf_start(scr, sample, al.thread, evsel,
+				   PERF_RECORD_CALLCHAIN_DEFERRED, fp);
+	fprintf(fp, "DEFERRED CALLCHAIN [cookie: %llx]",
+		(unsigned long long)event->callchain_deferred.cookie);
+
+	if (PRINT_FIELD(IP)) {
+		struct callchain_cursor *cursor = NULL;
+
+		if (symbol_conf.use_callchain && sample->callchain) {
+			cursor = get_tls_callchain_cursor();
+			if (thread__resolve_callchain(al.thread, cursor, evsel,
+						      sample, NULL, NULL,
+						      scripting_max_stack)) {
+				pr_info("cannot resolve deferred callchains\n");
+				cursor = NULL;
+			}
+		}
+
+		fputc(cursor ? '\n' : ' ', fp);
+		sample__fprintf_sym(sample, &al, 0, output[type].print_ip_opts,
+				    cursor, symbol_conf.bt_stop_list, fp);
+	}
+
+	fprintf(fp, "\n");
+
+	if (verbose > 0)
+		fflush(fp);
+
+out_put:
+	addr_location__exit(&al);
+	return ret;
+}
+
 // Used when scr->per_event_dump is not set
 static struct evsel_script es_stdout;
 
@@ -4303,6 +4391,7 @@ int cmd_script(int argc, const char **argv)
 
 	perf_tool__init(&script.tool, !unsorted_dump);
 	script.tool.sample		 = process_sample_event;
+	script.tool.callchain_deferred	 = process_deferred_sample_event;
 	script.tool.mmap		 = perf_event__process_mmap;
 	script.tool.mmap2		 = perf_event__process_mmap2;
 	script.tool.comm		 = perf_event__process_comm;
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 103984b29b1e10ae..10f1a03c28601e36 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -168,7 +168,10 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 				node_al.addr = addr;
 				node_al.map  = map__get(map);
 
-				if (print_symoffset) {
+				if (sample->deferred_callchain &&
+				    sample->deferred_cookie == node->ip) {
+					printed += fprintf(fp, "(cookie)");
+				} else if (print_symoffset) {
 					printed += __symbol__fprintf_symname_offs(sym, &node_al,
 										  print_unknown_as_addr,
 										  true, fp);
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/6] perf tools: Merge deferred user callchains
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
                   ` (3 preceding siblings ...)
  2025-11-20 23:48 ` [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
@ 2025-11-20 23:48 ` Namhyung Kim
  2025-12-02 23:14   ` Ian Rogers
  2025-12-12 11:16   ` Jens Remus
  2025-11-20 23:48 ` [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
  2025-12-03 17:58 ` [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
  6 siblings, 2 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Save samples with deferred callchains in a separate list and deliver
them after merging the user callchains.  If users don't want to merge
they can set tool->merge_deferred_callchains to false to prevent the
behavior.

With previous result, now perf script will show the merged callchains.

  $ perf script
  ...
  pwd    2312   121.163435:     249113 cpu/cycles/P:
          ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
          ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
          ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
          ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
          ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
          ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
          ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
              7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
              7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
  ...

The old output can be get using --no-merge-callchain option.
Also perf report can get the user callchain entry at the end.

  $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
  # symbol: __build_id_parse.isra.0
       8.40%  pwd      [kernel.kallsyms]
              |
              ---__build_id_parse.isra.0
                 perf_event_mmap
                 mprotect_fixup
                 do_mprotect_pkey
                 __x64_sys_mprotect
                 do_syscall_64
                 entry_SYSCALL_64_after_hwframe
                 mprotect
                 _dl_sysdep_start
                 _dl_start_user

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-script.txt |  5 ++
 tools/perf/builtin-inject.c              |  1 +
 tools/perf/builtin-report.c              |  1 +
 tools/perf/builtin-script.c              |  4 ++
 tools/perf/util/callchain.c              | 29 +++++++++
 tools/perf/util/callchain.h              |  3 +
 tools/perf/util/evlist.c                 |  1 +
 tools/perf/util/evlist.h                 |  2 +
 tools/perf/util/session.c                | 79 +++++++++++++++++++++++-
 tools/perf/util/tool.c                   |  2 +
 tools/perf/util/tool.h                   |  1 +
 11 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 28bec7e78bc858ba..03d1129606328d6d 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -527,6 +527,11 @@ include::itrace.txt[]
 	The known limitations include exception handing such as
 	setjmp/longjmp will have calls/returns not match.
 
+--merge-callchains::
+	Enable merging deferred user callchains if available.  This is the
+	default behavior.  If you want to see separate CALLCHAIN_DEFERRED
+	records for some reason, use --no-merge-callchains explicitly.
+
 :GMEXAMPLECMD: script
 :GMEXAMPLESUBCMD:
 include::guest-files.txt[]
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index bd9245d2dd41aa48..51d2721b6db9dccb 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
 	inject.tool.auxtrace		= perf_event__repipe_auxtrace;
 	inject.tool.bpf_metadata	= perf_event__repipe_op2_synth;
 	inject.tool.dont_split_sample_group = true;
+	inject.tool.merge_deferred_callchains = false;
 	inject.session = __perf_session__new(&data, &inject.tool,
 					     /*trace_event_repipe=*/inject.output.is_pipe,
 					     /*host_env=*/NULL);
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2bc269f5fcef8023..add6b1c2aaf04270 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
 	report.tool.event_update	 = perf_event__process_event_update;
 	report.tool.feature		 = process_feature_event;
 	report.tool.ordering_requires_timestamps = true;
+	report.tool.merge_deferred_callchains = !dump_trace;
 
 	session = perf_session__new(&data, &report.tool);
 	if (IS_ERR(session)) {
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 85b42205a71b3993..62e43d3c5ad731a0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
 	bool header_only = false;
 	bool script_started = false;
 	bool unsorted_dump = false;
+	bool merge_deferred_callchains = true;
 	char *rec_script_path = NULL;
 	char *rep_script_path = NULL;
 	struct perf_session *session;
@@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
 		    "Guest code can be found in hypervisor process"),
 	OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
 		    "Enable LBR callgraph stitching approach"),
+	OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
+		    "Enable merge deferred user callchains"),
 	OPTS_EVSWITCH(&script.evswitch),
 	OPT_END()
 	};
@@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
 	script.tool.throttle		 = process_throttle_event;
 	script.tool.unthrottle		 = process_throttle_event;
 	script.tool.ordering_requires_timestamps = true;
+	script.tool.merge_deferred_callchains = merge_deferred_callchains;
 	session = perf_session__new(&data, &script.tool);
 	if (IS_ERR(session))
 		return PTR_ERR(session);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 2884187ccbbecfdc..71dc5a070065dd2a 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
 	}
 	return 0;
 }
+
+int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
+				     struct perf_sample *sample_callchain)
+{
+	u64 nr_orig = sample_orig->callchain->nr - 1;
+	u64 nr_deferred = sample_callchain->callchain->nr;
+	struct ip_callchain *callchain;
+
+	if (sample_orig->callchain->nr < 2) {
+		sample_orig->deferred_callchain = false;
+		return -EINVAL;
+	}
+
+	callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
+	if (callchain == NULL) {
+		sample_orig->deferred_callchain = false;
+		return -ENOMEM;
+	}
+
+	callchain->nr = nr_orig + nr_deferred;
+	/* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
+	memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
+	/* copy deferred user callchains */
+	memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
+	       nr_deferred * sizeof(u64));
+
+	sample_orig->callchain = callchain;
+	return 0;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d5ae4fbb7ce5fa44..2a52af8c80ace33c 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -318,4 +318,7 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
 				    struct perf_sample *sample, int max_stack,
 				    bool symbols, callchain_iter_fn cb, void *data);
 
+int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
+				     struct perf_sample *sample_callchain);
+
 #endif	/* __PERF_CALLCHAIN_H */
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e8217efdda5323c6..03674d2cbd015e4f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -85,6 +85,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
 	evlist->ctl_fd.pos = -1;
 	evlist->nr_br_cntr = -1;
 	metricgroup__rblist_init(&evlist->metric_events);
+	INIT_LIST_HEAD(&evlist->deferred_samples);
 }
 
 struct evlist *evlist__new(void)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 5e71e3dc60423079..911834ae7c2a6f76 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -92,6 +92,8 @@ struct evlist {
 	 * of struct metric_expr.
 	 */
 	struct rblist	metric_events;
+	/* samples with deferred_callchain would wait here. */
+	struct list_head deferred_samples;
 };
 
 struct evsel_str_handler {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 361e15c1f26a96d0..dc570ad47ccc2c63 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1285,6 +1285,66 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
 					    per_thread);
 }
 
+/*
+ * Samples with deferred callchains should wait for the next matching
+ * PERF_RECORD_CALLCHAIN_RECORD entries.  Keep the events in a list and
+ * deliver them once it finds the callchains.
+ */
+struct deferred_event {
+	struct list_head list;
+	union perf_event *event;
+};
+
+static int evlist__deliver_deferred_callchain(struct evlist *evlist,
+					      const struct perf_tool *tool,
+					      union  perf_event *event,
+					      struct perf_sample *sample,
+					      struct machine *machine)
+{
+	struct deferred_event *de, *tmp;
+	struct evsel *evsel;
+	int ret = 0;
+
+	if (!tool->merge_deferred_callchains) {
+		evsel = evlist__id2evsel(evlist, sample->id);
+		return tool->callchain_deferred(tool, event, sample,
+						evsel, machine);
+	}
+
+	list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
+		struct perf_sample orig_sample;
+
+		ret = evlist__parse_sample(evlist, de->event, &orig_sample);
+		if (ret < 0) {
+			pr_err("failed to parse original sample\n");
+			break;
+		}
+
+		if (sample->tid != orig_sample.tid)
+			continue;
+
+		if (event->callchain_deferred.cookie == orig_sample.deferred_cookie)
+			sample__merge_deferred_callchain(&orig_sample, sample);
+		else
+			orig_sample.deferred_callchain = false;
+
+		evsel = evlist__id2evsel(evlist, orig_sample.id);
+		ret = evlist__deliver_sample(evlist, tool, de->event,
+					     &orig_sample, evsel, machine);
+
+		if (orig_sample.deferred_callchain)
+			free(orig_sample.callchain);
+
+		list_del(&de->list);
+		free(de->event);
+		free(de);
+
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static int machines__deliver_event(struct machines *machines,
 				   struct evlist *evlist,
 				   union perf_event *event,
@@ -1313,6 +1373,22 @@ static int machines__deliver_event(struct machines *machines,
 			return 0;
 		}
 		dump_sample(evsel, event, sample, perf_env__arch(machine->env));
+		if (sample->deferred_callchain && tool->merge_deferred_callchains) {
+			struct deferred_event *de = malloc(sizeof(*de));
+			size_t sz = event->header.size;
+
+			if (de == NULL)
+				return -ENOMEM;
+
+			de->event = malloc(sz);
+			if (de->event == NULL) {
+				free(de);
+				return -ENOMEM;
+			}
+			memcpy(de->event, event, sz);
+			list_add_tail(&de->list, &evlist->deferred_samples);
+			return 0;
+		}
 		return evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
 	case PERF_RECORD_MMAP:
 		return tool->mmap(tool, event, sample, machine);
@@ -1372,7 +1448,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->aux_output_hw_id(tool, event, sample, machine);
 	case PERF_RECORD_CALLCHAIN_DEFERRED:
 		dump_deferred_callchain(evsel, event, sample);
-		return tool->callchain_deferred(tool, event, sample, evsel, machine);
+		return evlist__deliver_deferred_callchain(evlist, tool, event,
+							  sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
index e77f0e2ecc1f79db..27ba5849c74a2e7d 100644
--- a/tools/perf/util/tool.c
+++ b/tools/perf/util/tool.c
@@ -266,6 +266,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
 	tool->cgroup_events = false;
 	tool->no_warn = false;
 	tool->show_feat_hdr = SHOW_FEAT_NO_HEADER;
+	tool->merge_deferred_callchains = true;
 
 	tool->sample = process_event_sample_stub;
 	tool->mmap = process_event_stub;
@@ -448,6 +449,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
 	tool->tool.cgroup_events = delegate->cgroup_events;
 	tool->tool.no_warn = delegate->no_warn;
 	tool->tool.show_feat_hdr = delegate->show_feat_hdr;
+	tool->tool.merge_deferred_callchains = delegate->merge_deferred_callchains;
 
 	tool->tool.sample = delegate_sample;
 	tool->tool.read = delegate_read;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 9b9f0a8cbf3de4b5..e96b69d25a5b737d 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -90,6 +90,7 @@ struct perf_tool {
 	bool		cgroup_events;
 	bool		no_warn;
 	bool		dont_split_sample_group;
+	bool		merge_deferred_callchains;
 	enum show_feature_header show_feat_hdr;
 };
 
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
                   ` (4 preceding siblings ...)
  2025-11-20 23:48 ` [PATCH v6 5/6] perf tools: Merge deferred user callchains Namhyung Kim
@ 2025-11-20 23:48 ` Namhyung Kim
  2025-12-02 23:15   ` Ian Rogers
  2025-12-03 17:58 ` [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
  6 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2025-11-20 23:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

It's possible that some kernel samples don't have matching deferred
callchain records when the profiling session was ended before the
threads came back to userspace.  Let's flush the samples before
finish the session.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/session.c | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dc570ad47ccc2c63..4236503c8f6c1350 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1295,6 +1295,10 @@ struct deferred_event {
 	union perf_event *event;
 };
 
+/*
+ * This is called when a deferred callchain record comes up.  Find all matching
+ * samples, merge the callchains and process them.
+ */
 static int evlist__deliver_deferred_callchain(struct evlist *evlist,
 					      const struct perf_tool *tool,
 					      union  perf_event *event,
@@ -1345,6 +1349,42 @@ static int evlist__deliver_deferred_callchain(struct evlist *evlist,
 	return ret;
 }
 
+/*
+ * This is called at the end of the data processing for the session.  Flush the
+ * remaining samples as there's no hope for matching deferred callchains.
+ */
+static int session__flush_deferred_samples(struct perf_session *session,
+					   const struct perf_tool *tool)
+{
+	struct evlist *evlist = session->evlist;
+	struct machine *machine = &session->machines.host;
+	struct deferred_event *de, *tmp;
+	struct evsel *evsel;
+	int ret = 0;
+
+	list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
+		struct perf_sample sample;
+
+		ret = evlist__parse_sample(evlist, de->event, &sample);
+		if (ret < 0) {
+			pr_err("failed to parse original sample\n");
+			break;
+		}
+
+		evsel = evlist__id2evsel(evlist, sample.id);
+		ret = evlist__deliver_sample(evlist, tool, de->event,
+					     &sample, evsel, machine);
+
+		list_del(&de->list);
+		free(de->event);
+		free(de);
+
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
 static int machines__deliver_event(struct machines *machines,
 				   struct evlist *evlist,
 				   union perf_event *event,
@@ -2038,6 +2078,9 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
 done:
 	/* do the final flush for ordered samples */
 	err = ordered_events__flush(oe, OE_FLUSH__FINAL);
+	if (err)
+		goto out_err;
+	err = session__flush_deferred_samples(session, tool);
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
@@ -2384,6 +2427,9 @@ static int __perf_session__process_events(struct perf_session *session)
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
+	if (err)
+		goto out_err;
+	err = session__flush_deferred_samples(session, tool);
 	if (err)
 		goto out_err;
 	err = perf_session__flush_thread_stacks(session);
@@ -2506,6 +2552,10 @@ static int __perf_session__process_dir_events(struct perf_session *session)
 	if (ret)
 		goto out_err;
 
+	ret = session__flush_deferred_samples(session, tool);
+	if (ret)
+		goto out_err;
+
 	ret = perf_session__flush_thread_stacks(session);
 out_err:
 	ui_progress__finish();
-- 
2.52.0.rc2.455.g230fcf2819-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains
  2025-11-20 23:48 ` [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
@ 2025-11-21  6:26   ` Thomas Richter
  2025-11-24 20:27     ` Namhyung Kim
  2025-12-03  5:49   ` Namhyung Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Richter @ 2025-11-21  6:26 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

On 11/21/25 00:48, Namhyung Kim wrote:
> Add a new callchain record mode option for deferred callchains.  For now
> it only works with FP (frame-pointer) mode.
> 
> And add the missing feature detection logic to clear the flag on old
> kernels.
> 
>   $ perf record --call-graph fp,defer -vv true

Does this also works for dwarf format?
    # perf record --call-graph dwarf,defer ....
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains
  2025-11-21  6:26   ` Thomas Richter
@ 2025-11-24 20:27     ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-11-24 20:27 UTC (permalink / raw)
  To: Thomas Richter
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, James Clark, Jiri Olsa,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

Hello,

On Fri, Nov 21, 2025 at 07:26:34AM +0100, Thomas Richter wrote:
> On 11/21/25 00:48, Namhyung Kim wrote:
> > Add a new callchain record mode option for deferred callchains.  For now
> > it only works with FP (frame-pointer) mode.
> > 
> > And add the missing feature detection logic to clear the flag on old
> > kernels.
> > 
> >   $ perf record --call-graph fp,defer -vv true
> 
> Does this also works for dwarf format?
>     # perf record --call-graph dwarf,defer ....

No, it's not supported.  Maybe we can do similar just to dump stack
content and registers for dwarf before returning to user.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/6] perf tools: Merge deferred user callchains
  2025-11-20 23:48 ` [PATCH v6 5/6] perf tools: Merge deferred user callchains Namhyung Kim
@ 2025-12-02 23:14   ` Ian Rogers
  2025-12-03  0:01     ` Namhyung Kim
  2025-12-12 11:16   ` Jens Remus
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2025-12-02 23:14 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
	Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
	Mathieu Desnoyers, linux-trace-kernel, bpf

On Thu, Nov 20, 2025 at 3:48 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Save samples with deferred callchains in a separate list and deliver
> them after merging the user callchains.  If users don't want to merge
> they can set tool->merge_deferred_callchains to false to prevent the
> behavior.
>
> With previous result, now perf script will show the merged callchains.
>
>   $ perf script
>   ...
>   pwd    2312   121.163435:     249113 cpu/cycles/P:
>           ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
>           ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
>           ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
>           ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
>           ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
>           ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
>           ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
>               7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>               7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>               7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>   ...
>
> The old output can be get using --no-merge-callchain option.
> Also perf report can get the user callchain entry at the end.
>
>   $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
>   # symbol: __build_id_parse.isra.0
>        8.40%  pwd      [kernel.kallsyms]
>               |
>               ---__build_id_parse.isra.0
>                  perf_event_mmap
>                  mprotect_fixup
>                  do_mprotect_pkey
>                  __x64_sys_mprotect
>                  do_syscall_64
>                  entry_SYSCALL_64_after_hwframe
>                  mprotect
>                  _dl_sysdep_start
>                  _dl_start_user
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Reviewed-by: Ian Rogers <irogers@google.com>

> ---
>  tools/perf/Documentation/perf-script.txt |  5 ++
>  tools/perf/builtin-inject.c              |  1 +
>  tools/perf/builtin-report.c              |  1 +
>  tools/perf/builtin-script.c              |  4 ++
>  tools/perf/util/callchain.c              | 29 +++++++++
>  tools/perf/util/callchain.h              |  3 +
>  tools/perf/util/evlist.c                 |  1 +
>  tools/perf/util/evlist.h                 |  2 +
>  tools/perf/util/session.c                | 79 +++++++++++++++++++++++-
>  tools/perf/util/tool.c                   |  2 +
>  tools/perf/util/tool.h                   |  1 +
>  11 files changed, 127 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> index 28bec7e78bc858ba..03d1129606328d6d 100644
> --- a/tools/perf/Documentation/perf-script.txt
> +++ b/tools/perf/Documentation/perf-script.txt
> @@ -527,6 +527,11 @@ include::itrace.txt[]
>         The known limitations include exception handing such as
>         setjmp/longjmp will have calls/returns not match.
>
> +--merge-callchains::
> +       Enable merging deferred user callchains if available.  This is the
> +       default behavior.  If you want to see separate CALLCHAIN_DEFERRED
> +       records for some reason, use --no-merge-callchains explicitly.
> +
>  :GMEXAMPLECMD: script
>  :GMEXAMPLESUBCMD:
>  include::guest-files.txt[]
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index bd9245d2dd41aa48..51d2721b6db9dccb 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
>         inject.tool.auxtrace            = perf_event__repipe_auxtrace;
>         inject.tool.bpf_metadata        = perf_event__repipe_op2_synth;
>         inject.tool.dont_split_sample_group = true;
> +       inject.tool.merge_deferred_callchains = false;
>         inject.session = __perf_session__new(&data, &inject.tool,
>                                              /*trace_event_repipe=*/inject.output.is_pipe,
>                                              /*host_env=*/NULL);
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 2bc269f5fcef8023..add6b1c2aaf04270 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
>         report.tool.event_update         = perf_event__process_event_update;
>         report.tool.feature              = process_feature_event;
>         report.tool.ordering_requires_timestamps = true;
> +       report.tool.merge_deferred_callchains = !dump_trace;
>
>         session = perf_session__new(&data, &report.tool);
>         if (IS_ERR(session)) {
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 85b42205a71b3993..62e43d3c5ad731a0 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
>         bool header_only = false;
>         bool script_started = false;
>         bool unsorted_dump = false;
> +       bool merge_deferred_callchains = true;
>         char *rec_script_path = NULL;
>         char *rep_script_path = NULL;
>         struct perf_session *session;
> @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
>                     "Guest code can be found in hypervisor process"),
>         OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
>                     "Enable LBR callgraph stitching approach"),
> +       OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
> +                   "Enable merge deferred user callchains"),
>         OPTS_EVSWITCH(&script.evswitch),
>         OPT_END()
>         };
> @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
>         script.tool.throttle             = process_throttle_event;
>         script.tool.unthrottle           = process_throttle_event;
>         script.tool.ordering_requires_timestamps = true;
> +       script.tool.merge_deferred_callchains = merge_deferred_callchains;
>         session = perf_session__new(&data, &script.tool);
>         if (IS_ERR(session))
>                 return PTR_ERR(session);
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 2884187ccbbecfdc..71dc5a070065dd2a 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
>         }
>         return 0;
>  }
> +
> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,

nit: We use the term deferred rather than original except in this
context. I think deferred is a little more intention revealing than
original. Perhaps add a comment capturing that the original sample is
the deferred kernel sample.

Thanks,
Ian

> +                                    struct perf_sample *sample_callchain)
> +{
> +       u64 nr_orig = sample_orig->callchain->nr - 1;
> +       u64 nr_deferred = sample_callchain->callchain->nr;
> +       struct ip_callchain *callchain;
> +
> +       if (sample_orig->callchain->nr < 2) {
> +               sample_orig->deferred_callchain = false;
> +               return -EINVAL;
> +       }
> +
> +       callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
> +       if (callchain == NULL) {
> +               sample_orig->deferred_callchain = false;
> +               return -ENOMEM;
> +       }
> +
> +       callchain->nr = nr_orig + nr_deferred;
> +       /* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
> +       memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
> +       /* copy deferred user callchains */
> +       memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
> +              nr_deferred * sizeof(u64));
> +
> +       sample_orig->callchain = callchain;
> +       return 0;
> +}
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index d5ae4fbb7ce5fa44..2a52af8c80ace33c 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -318,4 +318,7 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
>                                     struct perf_sample *sample, int max_stack,
>                                     bool symbols, callchain_iter_fn cb, void *data);
>
> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> +                                    struct perf_sample *sample_callchain);
> +
>  #endif /* __PERF_CALLCHAIN_H */
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e8217efdda5323c6..03674d2cbd015e4f 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -85,6 +85,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
>         evlist->ctl_fd.pos = -1;
>         evlist->nr_br_cntr = -1;
>         metricgroup__rblist_init(&evlist->metric_events);
> +       INIT_LIST_HEAD(&evlist->deferred_samples);
>  }
>
>  struct evlist *evlist__new(void)
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 5e71e3dc60423079..911834ae7c2a6f76 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -92,6 +92,8 @@ struct evlist {
>          * of struct metric_expr.
>          */
>         struct rblist   metric_events;
> +       /* samples with deferred_callchain would wait here. */
> +       struct list_head deferred_samples;
>  };
>
>  struct evsel_str_handler {
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 361e15c1f26a96d0..dc570ad47ccc2c63 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1285,6 +1285,66 @@ static int evlist__deliver_sample(struct evlist *evlist, const struct perf_tool
>                                             per_thread);
>  }
>
> +/*
> + * Samples with deferred callchains should wait for the next matching
> + * PERF_RECORD_CALLCHAIN_RECORD entries.  Keep the events in a list and
> + * deliver them once it finds the callchains.
> + */
> +struct deferred_event {
> +       struct list_head list;
> +       union perf_event *event;
> +};
> +
> +static int evlist__deliver_deferred_callchain(struct evlist *evlist,
> +                                             const struct perf_tool *tool,
> +                                             union  perf_event *event,
> +                                             struct perf_sample *sample,
> +                                             struct machine *machine)
> +{
> +       struct deferred_event *de, *tmp;
> +       struct evsel *evsel;
> +       int ret = 0;
> +
> +       if (!tool->merge_deferred_callchains) {
> +               evsel = evlist__id2evsel(evlist, sample->id);
> +               return tool->callchain_deferred(tool, event, sample,
> +                                               evsel, machine);
> +       }
> +
> +       list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> +               struct perf_sample orig_sample;
> +
> +               ret = evlist__parse_sample(evlist, de->event, &orig_sample);
> +               if (ret < 0) {
> +                       pr_err("failed to parse original sample\n");
> +                       break;
> +               }
> +
> +               if (sample->tid != orig_sample.tid)
> +                       continue;
> +
> +               if (event->callchain_deferred.cookie == orig_sample.deferred_cookie)
> +                       sample__merge_deferred_callchain(&orig_sample, sample);
> +               else
> +                       orig_sample.deferred_callchain = false;
> +
> +               evsel = evlist__id2evsel(evlist, orig_sample.id);
> +               ret = evlist__deliver_sample(evlist, tool, de->event,
> +                                            &orig_sample, evsel, machine);
> +
> +               if (orig_sample.deferred_callchain)
> +                       free(orig_sample.callchain);
> +
> +               list_del(&de->list);
> +               free(de->event);
> +               free(de);
> +
> +               if (ret)
> +                       break;
> +       }
> +       return ret;
> +}
> +
>  static int machines__deliver_event(struct machines *machines,
>                                    struct evlist *evlist,
>                                    union perf_event *event,
> @@ -1313,6 +1373,22 @@ static int machines__deliver_event(struct machines *machines,
>                         return 0;
>                 }
>                 dump_sample(evsel, event, sample, perf_env__arch(machine->env));
> +               if (sample->deferred_callchain && tool->merge_deferred_callchains) {
> +                       struct deferred_event *de = malloc(sizeof(*de));
> +                       size_t sz = event->header.size;
> +
> +                       if (de == NULL)
> +                               return -ENOMEM;
> +
> +                       de->event = malloc(sz);
> +                       if (de->event == NULL) {
> +                               free(de);
> +                               return -ENOMEM;
> +                       }
> +                       memcpy(de->event, event, sz);
> +                       list_add_tail(&de->list, &evlist->deferred_samples);
> +                       return 0;
> +               }
>                 return evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
>         case PERF_RECORD_MMAP:
>                 return tool->mmap(tool, event, sample, machine);
> @@ -1372,7 +1448,8 @@ static int machines__deliver_event(struct machines *machines,
>                 return tool->aux_output_hw_id(tool, event, sample, machine);
>         case PERF_RECORD_CALLCHAIN_DEFERRED:
>                 dump_deferred_callchain(evsel, event, sample);
> -               return tool->callchain_deferred(tool, event, sample, evsel, machine);
> +               return evlist__deliver_deferred_callchain(evlist, tool, event,
> +                                                         sample, machine);
>         default:
>                 ++evlist->stats.nr_unknown_events;
>                 return -1;
> diff --git a/tools/perf/util/tool.c b/tools/perf/util/tool.c
> index e77f0e2ecc1f79db..27ba5849c74a2e7d 100644
> --- a/tools/perf/util/tool.c
> +++ b/tools/perf/util/tool.c
> @@ -266,6 +266,7 @@ void perf_tool__init(struct perf_tool *tool, bool ordered_events)
>         tool->cgroup_events = false;
>         tool->no_warn = false;
>         tool->show_feat_hdr = SHOW_FEAT_NO_HEADER;
> +       tool->merge_deferred_callchains = true;
>
>         tool->sample = process_event_sample_stub;
>         tool->mmap = process_event_stub;
> @@ -448,6 +449,7 @@ void delegate_tool__init(struct delegate_tool *tool, struct perf_tool *delegate)
>         tool->tool.cgroup_events = delegate->cgroup_events;
>         tool->tool.no_warn = delegate->no_warn;
>         tool->tool.show_feat_hdr = delegate->show_feat_hdr;
> +       tool->tool.merge_deferred_callchains = delegate->merge_deferred_callchains;
>
>         tool->tool.sample = delegate_sample;
>         tool->tool.read = delegate_read;
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 9b9f0a8cbf3de4b5..e96b69d25a5b737d 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -90,6 +90,7 @@ struct perf_tool {
>         bool            cgroup_events;
>         bool            no_warn;
>         bool            dont_split_sample_group;
> +       bool            merge_deferred_callchains;
>         enum show_feature_header show_feat_hdr;
>  };
>
> --
> 2.52.0.rc2.455.g230fcf2819-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains
  2025-11-20 23:48 ` [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
@ 2025-12-02 23:15   ` Ian Rogers
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2025-12-02 23:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
	Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
	Mathieu Desnoyers, linux-trace-kernel, bpf

On Thu, Nov 20, 2025 at 3:48 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> It's possible that some kernel samples don't have matching deferred
> callchain records when the profiling session was ended before the
> threads came back to userspace.  Let's flush the samples before
> finish the session.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/session.c | 50 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index dc570ad47ccc2c63..4236503c8f6c1350 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1295,6 +1295,10 @@ struct deferred_event {
>         union perf_event *event;
>  };
>
> +/*
> + * This is called when a deferred callchain record comes up.  Find all matching
> + * samples, merge the callchains and process them.
> + */
>  static int evlist__deliver_deferred_callchain(struct evlist *evlist,
>                                               const struct perf_tool *tool,
>                                               union  perf_event *event,
> @@ -1345,6 +1349,42 @@ static int evlist__deliver_deferred_callchain(struct evlist *evlist,
>         return ret;
>  }
>
> +/*
> + * This is called at the end of the data processing for the session.  Flush the
> + * remaining samples as there's no hope for matching deferred callchains.
> + */
> +static int session__flush_deferred_samples(struct perf_session *session,
> +                                          const struct perf_tool *tool)
> +{
> +       struct evlist *evlist = session->evlist;
> +       struct machine *machine = &session->machines.host;
> +       struct deferred_event *de, *tmp;
> +       struct evsel *evsel;
> +       int ret = 0;
> +
> +       list_for_each_entry_safe(de, tmp, &evlist->deferred_samples, list) {
> +               struct perf_sample sample;
> +
> +               ret = evlist__parse_sample(evlist, de->event, &sample);
> +               if (ret < 0) {
> +                       pr_err("failed to parse original sample\n");
> +                       break;
> +               }
> +
> +               evsel = evlist__id2evsel(evlist, sample.id);
> +               ret = evlist__deliver_sample(evlist, tool, de->event,
> +                                            &sample, evsel, machine);
> +
> +               list_del(&de->list);
> +               free(de->event);
> +               free(de);
> +
> +               if (ret)
> +                       break;
> +       }
> +       return ret;
> +}
> +
>  static int machines__deliver_event(struct machines *machines,
>                                    struct evlist *evlist,
>                                    union perf_event *event,
> @@ -2038,6 +2078,9 @@ static int __perf_session__process_pipe_events(struct perf_session *session)
>  done:
>         /* do the final flush for ordered samples */
>         err = ordered_events__flush(oe, OE_FLUSH__FINAL);
> +       if (err)
> +               goto out_err;
> +       err = session__flush_deferred_samples(session, tool);
>         if (err)
>                 goto out_err;
>         err = auxtrace__flush_events(session, tool);
> @@ -2384,6 +2427,9 @@ static int __perf_session__process_events(struct perf_session *session)
>         if (err)
>                 goto out_err;
>         err = auxtrace__flush_events(session, tool);
> +       if (err)
> +               goto out_err;
> +       err = session__flush_deferred_samples(session, tool);
>         if (err)
>                 goto out_err;
>         err = perf_session__flush_thread_stacks(session);
> @@ -2506,6 +2552,10 @@ static int __perf_session__process_dir_events(struct perf_session *session)
>         if (ret)
>                 goto out_err;
>
> +       ret = session__flush_deferred_samples(session, tool);
> +       if (ret)
> +               goto out_err;
> +
>         ret = perf_session__flush_thread_stacks(session);
>  out_err:
>         ui_progress__finish();
> --
> 2.52.0.rc2.455.g230fcf2819-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/6] perf tools: Merge deferred user callchains
  2025-12-02 23:14   ` Ian Rogers
@ 2025-12-03  0:01     ` Namhyung Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-12-03  0:01 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, James Clark, Jiri Olsa, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
	Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Jens Remus,
	Mathieu Desnoyers, linux-trace-kernel, bpf

On Tue, Dec 02, 2025 at 03:14:31PM -0800, Ian Rogers wrote:
> On Thu, Nov 20, 2025 at 3:48 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Save samples with deferred callchains in a separate list and deliver
> > them after merging the user callchains.  If users don't want to merge
> > they can set tool->merge_deferred_callchains to false to prevent the
> > behavior.
> >
> > With previous result, now perf script will show the merged callchains.
> >
> >   $ perf script
> >   ...
> >   pwd    2312   121.163435:     249113 cpu/cycles/P:
> >           ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
> >           ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
> >           ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
> >           ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
> >           ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
> >           ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
> >           ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> >               7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> >               7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> >               7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> >   ...
> >
> > The old output can be get using --no-merge-callchain option.
> > Also perf report can get the user callchain entry at the end.
> >
> >   $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
> >   # symbol: __build_id_parse.isra.0
> >        8.40%  pwd      [kernel.kallsyms]
> >               |
> >               ---__build_id_parse.isra.0
> >                  perf_event_mmap
> >                  mprotect_fixup
> >                  do_mprotect_pkey
> >                  __x64_sys_mprotect
> >                  do_syscall_64
> >                  entry_SYSCALL_64_after_hwframe
> >                  mprotect
> >                  _dl_sysdep_start
> >                  _dl_start_user
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> 
> Reviewed-by: Ian Rogers <irogers@google.com>
> 
> > ---
> >  tools/perf/Documentation/perf-script.txt |  5 ++
> >  tools/perf/builtin-inject.c              |  1 +
> >  tools/perf/builtin-report.c              |  1 +
> >  tools/perf/builtin-script.c              |  4 ++
> >  tools/perf/util/callchain.c              | 29 +++++++++
> >  tools/perf/util/callchain.h              |  3 +
> >  tools/perf/util/evlist.c                 |  1 +
> >  tools/perf/util/evlist.h                 |  2 +
> >  tools/perf/util/session.c                | 79 +++++++++++++++++++++++-
> >  tools/perf/util/tool.c                   |  2 +
> >  tools/perf/util/tool.h                   |  1 +
> >  11 files changed, 127 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> > index 28bec7e78bc858ba..03d1129606328d6d 100644
> > --- a/tools/perf/Documentation/perf-script.txt
> > +++ b/tools/perf/Documentation/perf-script.txt
> > @@ -527,6 +527,11 @@ include::itrace.txt[]
> >         The known limitations include exception handing such as
> >         setjmp/longjmp will have calls/returns not match.
> >
> > +--merge-callchains::
> > +       Enable merging deferred user callchains if available.  This is the
> > +       default behavior.  If you want to see separate CALLCHAIN_DEFERRED
> > +       records for some reason, use --no-merge-callchains explicitly.
> > +
> >  :GMEXAMPLECMD: script
> >  :GMEXAMPLESUBCMD:
> >  include::guest-files.txt[]
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index bd9245d2dd41aa48..51d2721b6db9dccb 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
> >         inject.tool.auxtrace            = perf_event__repipe_auxtrace;
> >         inject.tool.bpf_metadata        = perf_event__repipe_op2_synth;
> >         inject.tool.dont_split_sample_group = true;
> > +       inject.tool.merge_deferred_callchains = false;
> >         inject.session = __perf_session__new(&data, &inject.tool,
> >                                              /*trace_event_repipe=*/inject.output.is_pipe,
> >                                              /*host_env=*/NULL);
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index 2bc269f5fcef8023..add6b1c2aaf04270 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
> >         report.tool.event_update         = perf_event__process_event_update;
> >         report.tool.feature              = process_feature_event;
> >         report.tool.ordering_requires_timestamps = true;
> > +       report.tool.merge_deferred_callchains = !dump_trace;
> >
> >         session = perf_session__new(&data, &report.tool);
> >         if (IS_ERR(session)) {
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index 85b42205a71b3993..62e43d3c5ad731a0 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
> >         bool header_only = false;
> >         bool script_started = false;
> >         bool unsorted_dump = false;
> > +       bool merge_deferred_callchains = true;
> >         char *rec_script_path = NULL;
> >         char *rep_script_path = NULL;
> >         struct perf_session *session;
> > @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
> >                     "Guest code can be found in hypervisor process"),
> >         OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
> >                     "Enable LBR callgraph stitching approach"),
> > +       OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
> > +                   "Enable merge deferred user callchains"),
> >         OPTS_EVSWITCH(&script.evswitch),
> >         OPT_END()
> >         };
> > @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
> >         script.tool.throttle             = process_throttle_event;
> >         script.tool.unthrottle           = process_throttle_event;
> >         script.tool.ordering_requires_timestamps = true;
> > +       script.tool.merge_deferred_callchains = merge_deferred_callchains;
> >         session = perf_session__new(&data, &script.tool);
> >         if (IS_ERR(session))
> >                 return PTR_ERR(session);
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 2884187ccbbecfdc..71dc5a070065dd2a 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> >         }
> >         return 0;
> >  }
> > +
> > +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> 
> nit: We use the term deferred rather than original except in this
> context. I think deferred is a little more intention revealing than
> original. Perhaps add a comment capturing that the original sample is
> the deferred kernel sample.

Sure, will add.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains
  2025-11-20 23:48 ` [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
  2025-11-21  6:26   ` Thomas Richter
@ 2025-12-03  5:49   ` Namhyung Kim
  1 sibling, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-12-03  5:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

On Thu, Nov 20, 2025 at 03:48:01PM -0800, Namhyung Kim wrote:
> Add a new callchain record mode option for deferred callchains.  For now
> it only works with FP (frame-pointer) mode.
> 
> And add the missing feature detection logic to clear the flag on old
> kernels.
> 
>   $ perf record --call-graph fp,defer -vv true
>   ...
>   ------------------------------------------------------------
>   perf_event_attr:
>     type                             0 (PERF_TYPE_HARDWARE)
>     size                             136
>     config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>     { sample_period, sample_freq }   4000
>     sample_type                      IP|TID|TIME|CALLCHAIN|PERIOD
>     read_format                      ID|LOST
>     disabled                         1
>     inherit                          1
>     mmap                             1
>     comm                             1
>     freq                             1
>     enable_on_exec                   1
>     task                             1
>     sample_id_all                    1
>     mmap2                            1
>     comm_exec                        1
>     ksymbol                          1
>     bpf_event                        1
>     defer_callchain                  1
>     defer_output                     1
>   ------------------------------------------------------------
>   sys_perf_event_open: pid 162755  cpu 0  group_fd -1  flags 0x8
>   sys_perf_event_open failed, error -22
>   switching off deferred callchain support
> 
> Reviewed-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-config.txt |  3 +++
>  tools/perf/Documentation/perf-record.txt |  4 ++++
>  tools/perf/util/callchain.c              | 16 +++++++++++++---
>  tools/perf/util/callchain.h              |  1 +
>  tools/perf/util/evsel.c                  | 19 +++++++++++++++++++
>  tools/perf/util/evsel.h                  |  1 +
>  6 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
> index c6f33565966735fe..642d1c490d9e3bcd 100644
> --- a/tools/perf/Documentation/perf-config.txt
> +++ b/tools/perf/Documentation/perf-config.txt
> @@ -452,6 +452,9 @@ Variables
>  		kernel space is controlled not by this option but by the
>  		kernel config (CONFIG_UNWINDER_*).
>  
> +		The 'defer' mode can be used with 'fp' mode to enable deferred
> +		user callchains (like 'fp,defer').
> +
>  	call-graph.dump-size::
>  		The size of stack to dump in order to do post-unwinding. Default is 8192 (byte).
>  		When using dwarf into record-mode, the default size will be used if omitted.
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 067891bd7da6edc8..e8b9aadbbfa50574 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -325,6 +325,10 @@ OPTIONS
>  	by default.  User can change the number by passing it after comma
>  	like "--call-graph fp,32".
>  
> +	Also "defer" can be used with "fp" (like "--call-graph fp,defer") to
> +	enable deferred user callchain which will collect user-space callchains
> +	when the thread returns to the user space.
> +
>  -q::
>  --quiet::
>  	Don't print any warnings or messages, useful for scripting.
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index d7b7eef740b9d6ed..2884187ccbbecfdc 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -275,9 +275,13 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
>  			if (tok) {
>  				unsigned long size;
>  
> -				size = strtoul(tok, &name, 0);
> -				if (size < (unsigned) sysctl__max_stack())
> -					param->max_stack = size;
> +				if (!strncmp(tok, "defer", sizeof("defer"))) {
> +					param->defer = true;
> +				} else {
> +					size = strtoul(tok, &name, 0);
> +					if (size < (unsigned) sysctl__max_stack())
> +						param->max_stack = size;
> +				}
>  			}
>  			break;
>  
> @@ -314,6 +318,12 @@ int parse_callchain_record(const char *arg, struct callchain_param *param)
>  	} while (0);
>  
>  	free(buf);
> +
> +	if (param->defer && param->record_mode != CALLCHAIN_FP) {
> +		pr_err("callchain: deferred callchain only works with FP\n");
> +		return -EINVAL;
> +	}
> +
>  	return ret;
>  }
>  
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index 86ed9e4d04f9ee7b..d5ae4fbb7ce5fa44 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -98,6 +98,7 @@ extern bool dwarf_callchain_users;
>  
>  struct callchain_param {
>  	bool			enabled;
> +	bool			defer;
>  	enum perf_call_graph_mode record_mode;
>  	u32			dump_size;
>  	enum chain_mode 	mode;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f1a311637694ac0a..887c6ac6c49cc415 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1065,6 +1065,9 @@ static void __evsel__config_callchain(struct evsel *evsel, struct record_opts *o
>  		pr_info("Disabling user space callchains for function trace event.\n");
>  		attr->exclude_callchain_user = 1;
>  	}
> +
> +	if (param->defer && !attr->exclude_callchain_user)
> +		attr->defer_callchain = 1;
>  }
>  
>  void evsel__config_callchain(struct evsel *evsel, struct record_opts *opts,
> @@ -1511,6 +1514,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
>  	attr->mmap2    = track && !perf_missing_features.mmap2;
>  	attr->comm     = track;
>  	attr->build_id = track && opts->build_id;
> +	attr->defer_output = track && callchain->defer;

I need to update this part like below:

	attr->defer_output = track && callchain && callchain->defer;

Thanks,
Namhyung

>  
>  	/*
>  	 * ksymbol is tracked separately with text poke because it needs to be
> @@ -2199,6 +2203,10 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
>  
>  static void evsel__disable_missing_features(struct evsel *evsel)
>  {
> +	if (perf_missing_features.defer_callchain && evsel->core.attr.defer_callchain)
> +		evsel->core.attr.defer_callchain = 0;
> +	if (perf_missing_features.defer_callchain && evsel->core.attr.defer_output)
> +		evsel->core.attr.defer_output = 0;
>  	if (perf_missing_features.inherit_sample_read && evsel->core.attr.inherit &&
>  	    (evsel->core.attr.sample_type & PERF_SAMPLE_READ))
>  		evsel->core.attr.inherit = 0;
> @@ -2473,6 +2481,13 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
>  
>  	/* Please add new feature detection here. */
>  
> +	attr.defer_callchain = true;
> +	if (has_attr_feature(&attr, /*flags=*/0))
> +		goto found;
> +	perf_missing_features.defer_callchain = true;
> +	pr_debug2("switching off deferred callchain support\n");
> +	attr.defer_callchain = false;
> +
>  	attr.inherit = true;
>  	attr.sample_type = PERF_SAMPLE_READ | PERF_SAMPLE_TID;
>  	if (has_attr_feature(&attr, /*flags=*/0))
> @@ -2584,6 +2599,10 @@ static bool evsel__detect_missing_features(struct evsel *evsel, struct perf_cpu
>  	errno = old_errno;
>  
>  check:
> +	if ((evsel->core.attr.defer_callchain || evsel->core.attr.defer_output) &&
> +	    perf_missing_features.defer_callchain)
> +		return true;
> +
>  	if (evsel->core.attr.inherit &&
>  	    (evsel->core.attr.sample_type & PERF_SAMPLE_READ) &&
>  	    perf_missing_features.inherit_sample_read)
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 3ae4ac8f9a37e009..a08130ff2e47a887 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -221,6 +221,7 @@ struct perf_missing_features {
>  	bool branch_counters;
>  	bool aux_action;
>  	bool inherit_sample_read;
> +	bool defer_callchain;
>  };
>  
>  extern struct perf_missing_features perf_missing_features;
> -- 
> 2.52.0.rc2.455.g230fcf2819-goog
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCHSET v6 0/6] perf tools: Add deferred callchain support
  2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
                   ` (5 preceding siblings ...)
  2025-11-20 23:48 ` [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
@ 2025-12-03 17:58 ` Namhyung Kim
  6 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2025-12-03 17:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, James Clark, Namhyung Kim
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Jens Remus, Mathieu Desnoyers, linux-trace-kernel, bpf

On Thu, 20 Nov 2025 15:47:58 -0800, Namhyung Kim wrote:
> This is a new version of deferred callchain support as the kernel part
> is merged to the tip tree.  Actually this is based on Steve's work (v16).
> 
>   https://lore.kernel.org/r/20250908175319.841517121@kernel.org
> 
> v6 changes)
> 
> [...]
Applied to perf-tools-next, thanks!

Best regards,
Namhyung



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/6] perf tools: Merge deferred user callchains
  2025-11-20 23:48 ` [PATCH v6 5/6] perf tools: Merge deferred user callchains Namhyung Kim
  2025-12-02 23:14   ` Ian Rogers
@ 2025-12-12 11:16   ` Jens Remus
  2025-12-12 11:48     ` Jens Remus
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Remus @ 2025-12-12 11:16 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Mathieu Desnoyers, linux-trace-kernel, bpf

Hello Namhyung!

On 11/21/2025 12:48 AM, Namhyung Kim wrote:
> Save samples with deferred callchains in a separate list and deliver
> them after merging the user callchains.  If users don't want to merge
> they can set tool->merge_deferred_callchains to false to prevent the
> behavior.

> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c

> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
> +				     struct perf_sample *sample_callchain)
> +{
> +	u64 nr_orig = sample_orig->callchain->nr - 1;
> +	u64 nr_deferred = sample_callchain->callchain->nr;
> +	struct ip_callchain *callchain;
> +
> +	if (sample_orig->callchain->nr < 2) {
> +		sample_orig->deferred_callchain = false;
> +		return -EINVAL;
> +	}
> +
> +	callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
> +	if (callchain == NULL) {
> +		sample_orig->deferred_callchain = false;
> +		return -ENOMEM;
> +	}
> +
> +	callchain->nr = nr_orig + nr_deferred;
> +	/* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
> +	memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
> +	/* copy deferred user callchains */
> +	memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
> +	       nr_deferred * sizeof(u64));
> +
> +	sample_orig->callchain = callchain;

Hope you don't mind my naive question, as I don't have a clue about perf:

Doesn't the sample_orig->callchain storage need to be free'd prior to
assigning the newly allocated one?  Or is that just part of a large
block that got allocated in one piece?  How is then the one allocated
here ever free'd?

> +	return 0;
> +}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 5/6] perf tools: Merge deferred user callchains
  2025-12-12 11:16   ` Jens Remus
@ 2025-12-12 11:48     ` Jens Remus
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Remus @ 2025-12-12 11:48 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Mathieu Desnoyers, linux-trace-kernel, bpf

Hello Namhyung,

sorry for the fuss!

On 12/12/2025 12:16 PM, Jens Remus wrote:

> On 11/21/2025 12:48 AM, Namhyung Kim wrote:
>> Save samples with deferred callchains in a separate list and deliver
>> them after merging the user callchains.  If users don't want to merge
>> they can set tool->merge_deferred_callchains to false to prevent the
>> behavior.
> 
>> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> 
>> +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
>> +				     struct perf_sample *sample_callchain)
>> +{
>> +	u64 nr_orig = sample_orig->callchain->nr - 1;
>> +	u64 nr_deferred = sample_callchain->callchain->nr;
>> +	struct ip_callchain *callchain;
>> +
>> +	if (sample_orig->callchain->nr < 2) {
>> +		sample_orig->deferred_callchain = false;
>> +		return -EINVAL;
>> +	}
>> +
>> +	callchain = calloc(1 + nr_orig + nr_deferred, sizeof(u64));
>> +	if (callchain == NULL) {
>> +		sample_orig->deferred_callchain = false;
>> +		return -ENOMEM;
>> +	}
>> +
>> +	callchain->nr = nr_orig + nr_deferred;
>> +	/* copy original including PERF_CONTEXT_USER_DEFERRED (but the cookie) */
>> +	memcpy(callchain->ips, sample_orig->callchain->ips, nr_orig * sizeof(u64));
>> +	/* copy deferred user callchains */
>> +	memcpy(&callchain->ips[nr_orig], sample_callchain->callchain->ips,
>> +	       nr_deferred * sizeof(u64));
>> +
>> +	sample_orig->callchain = callchain;
> 
> Hope you don't mind my naive question, as I don't have a clue about perf:
> 
> Doesn't the sample_orig->callchain storage need to be free'd prior to
> assigning the newly allocated one?  Or is that just part of a large
> block that got allocated in one piece?  How is then the one allocated
> here ever free'd?

Never mind, I found that it is getting free'd in
evlist__deliver_deferred_callchain().

> 
>> +	return 0;
>> +}
> Thanks and regards,
> Jens

Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
  2025-11-20 23:48 ` [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
@ 2025-12-12 12:11   ` Jens Remus
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Remus @ 2025-12-12 12:11 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, James Clark
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
	Mathieu Desnoyers, linux-trace-kernel, bpf, Heiko Carstens,
	Vasily Gorbik

Hello Namhyung,

following is an observation from my attempt to enable unwind user fp on
s390 using s390 back chain instead of frame pointer and relaxing the
s390-specific IP validation check.

When capturing call graphs of a Java application the list of "unwound"
user space IPs may contain invalid entries, such as 0x0, 0xdeaddeaf,
and 0xffffffffffffff.  IPs that exceed PERF_CONTEXT_MAX, such as the
latter, cause perf not to display any deferred (or merged) call chain.
Note that this is not caused by your patch series.

While re-adding the s390-specific IP checks would "hide" those, I found
that the call graphs look good otherwise.  That is the back chain seems
to be intact.  It is just the user space application (e.g. Java JRE) not
correctly adhering to the ABI and saving the return address to the
specified location on the stack, causing bogus IPs to be reported.

Could perf be improved to handle those user space IPs that exceed
PERF_CONTEXT_MAX?

Is there otherwise guidance how unwind user and/or the s390
implementation should deal with such IPs?  Should it stop taking the
deferred calltrace?  Should it substitute those with e.g 0, so that
perf can display them?


Sample for IP == deaddeaf (perf displays this correctly):

java    1084    33.086790: DEFERRED CALLCHAIN [cookie: 2000001f9]
             3ff983f071c java.lang.String CryptoBench.crypt(java.lang.String)+0x89c (/tmp/perf-1082.map)
             3ff983ff894 void CryptoBench.execute()+0x94 (/tmp/perf-1082.map)
       ! -->    deaddeaf [unknown] ([unknown])
             3ff97e37900 StubRoutines (initial stubs)+0x80 (/tmp/perf-1082.map)
             3ff97e41080 Interpreter+0x3300 (/tmp/perf-1082.map)
             3ffae2d11de JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x3e (/usr/lib/jvm/.../libjvm.so)
             3ffae38df92 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x242 (/usr/lib/jvm/.../libjvm.so)
             3ffae390e86 jni_CallStaticVoidMethod+0x116 (/usr/lib/jvm/.../libjvm.so)
             3ffaf08b07e JavaMain+0x113e (/usr/lib/jvm/.../libjli.so)
             3ffaf08e040 ThreadJavaMain+0x20 (/usr/lib/jvm/.../libjli.so)
             3ffaedabbd8 start_thread+0x198 (/usr/lib64/libc.so.6)
             3ffaee2b950 thread_start+0x10 (/usr/lib64/libc.so.6)


Sample for IP == 0 (perf displays this correctly):

java    1084    33.021987: DEFERRED CALLCHAIN [cookie: 20000017b]
             3ff983f067c java.lang.String CryptoBench.crypt(java.lang.String)+0x7fc (/tmp/perf-1082.map)
             3ff9098aa88 void CryptoBench.execute()+0x748 (/tmp/perf-1082.map)
       ! -->           0 [unknown] ([unknown])
             3ff97e37900 StubRoutines (initial stubs)+0x80 (/tmp/perf-1082.map)
             3ff97e41080 Interpreter+0x3300 (/tmp/perf-1082.map)
             3ffae2d11de JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x3e (/usr/lib/jvm/.../libjvm.so)
             3ffae38df92 jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x242 (/usr/lib/jvm/.../libjvm.so)
             3ffae390e86 jni_CallStaticVoidMethod+0x116 (/usr/lib/jvm/.../libjvm.so)
             3ffaf08b07e JavaMain+0x113e (/usr/lib/jvm/.../libjli.so)
             3ffaf08e040 ThreadJavaMain+0x20 (/usr/lib/jvm/.../libjli.so)
             3ffaedabbd8 start_thread+0x198 (/usr/lib64/libc.so.6)
             3ffaee2b950 thread_start+0x10 (/usr/lib64/libc.so.6)

Note that for the IP==0 case I am thinking about adding a common unwind
user check, to stop taking the deferred calltrace.


Sample for IP == ffffffffffffff (perf does not display any call chain):

# perf script
...
java    1084    44.004346:    1001001 task-clock:ppp:

...
[next entry]

# perf script --no-merge-callchain
...
java    1084    44.004346:    1001001 task-clock:ppp:
               400000079 (cookie) ([unknown])

java    1084    44.004348: DEFERRED CALLCHAIN [cookie: 400000079]

...
[next entry]

# perf report -D
...
44004346257 0x17718 [0x40]: PERF_RECORD_SAMPLE(IP, 0x2): 1082/1084: 0x3ffa3e413aa period: 1001001 addr: 0
... FP chain: nr:2
.....  0: fffffffffffffd80
.....  1: 0000000400000079
...... (deferred)
 ... thread: java:1084
 ...... dso: /tmp/perf-1082.map

0x17758@perf.data [0xd0]: event: 22
.
. ... raw event: size 208 bytes
.  0000:  00 00 00 16 00 02 00 d0 00 00 00 04 00 00 00 79  ...............y
.  0010:  00 00 00 00 00 00 00 15 00 00 03 ff a3 e4 13 aa  ................
.  0020:  00 00 03 ff 38 09 e2 d0 00 00 03 ff 38 09 e1 30  ....8.......8..0
.  0030:  00 00 03 ff b9 5f df 68 00 00 00 00 00 00 00 00  ....._.h........
.  0040:  00 00 03 ff b9 5f e1 28 00 00 03 ff b9 5f e1 d0  ....._.(....._..
.  0050:  00 57 80 88 8e 76 47 a5 00 00 03 ff a3 e4 37 f2  .W...vG.......7.
.  0060:  ff ff ff ff ff ff ff ff 00 00 03 ff a3 e4 a1 fc  ................
.  0070:  00 00 00 00 00 00 00 00 00 00 03 ff a3 e3 79 00  ..............y.
.  0080:  00 00 03 ff a3 e4 10 80 00 00 03 ff b9 dd 11 de  ................
.  0090:  00 00 03 ff b9 e8 df 92 00 00 03 ff b9 e9 0e 86  ................
.  00a0:  00 00 03 ff ba b8 b0 7e 00 00 03 ff ba b8 e0 40  .......~.......@
.  00b0:  00 00 03 ff ba 8a bb d8 00 00 03 ff ba 92 b9 50  ...............P
.  00c0:  00 00 04 3a 00 00 04 3c 00 00 00 0a 3e dd 13 c0  ...:...<....>...

44004348864 0x17758 [0xd0]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 1082/1084: 0x400000079
... FP chain: nr:21
.....  0: 000003ffa3e413aa
.....  1: 000003ff3809e2d0
.....  2: 000003ff3809e130
.....  3: 000003ffb95fdf68
.....  4: 0000000000000000
.....  5: 000003ffb95fe128
.....  6: 000003ffb95fe1d0
.....  7: 005780888e7647a5
.....  8: 000003ffa3e437f2
.....  9: ffffffffffffffff <-- !
..... 10: 000003ffa3e4a1fc
..... 11: 0000000000000000
..... 12: 000003ffa3e37900
..... 13: 000003ffa3e41080
..... 14: 000003ffb9dd11de
..... 15: 000003ffb9e8df92
..... 16: 000003ffb9e90e86
..... 17: 000003ffbab8b07e
..... 18: 000003ffbab8e040
..... 19: 000003ffba8abbd8
..... 20: 000003ffba92b950
: unhandled!

...
[next entry]


On 11/21/2025 12:48 AM, Namhyung Kim wrote:
> Handle the deferred callchains in the script output.
> 
>   $ perf script
>   ...
>   pwd    2312   121.163435:     249113 cpu/cycles/P:
>           ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
>           ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
>           ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
>           ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
>           ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
>           ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
>           ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
>                  b00000006 (cookie) ([unknown])
> 
>   pwd    2312   121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
>               7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>               7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>               7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c

> +static int process_deferred_sample_event(const struct perf_tool *tool,
> +					 union perf_event *event,
> +					 struct perf_sample *sample,
> +					 struct evsel *evsel,
> +					 struct machine *machine)
> +{

> +	perf_sample__fprintf_start(scr, sample, al.thread, evsel,
> +				   PERF_RECORD_CALLCHAIN_DEFERRED, fp);
> +	fprintf(fp, "DEFERRED CALLCHAIN [cookie: %llx]",
> +		(unsigned long long)event->callchain_deferred.cookie);
> +
> +	if (PRINT_FIELD(IP)) {
> +		struct callchain_cursor *cursor = NULL;
> +
> +		if (symbol_conf.use_callchain && sample->callchain) {
> +			cursor = get_tls_callchain_cursor();
> +			if (thread__resolve_callchain(al.thread, cursor, evsel,
> +						      sample, NULL, NULL,
> +						      scripting_max_stack)) {

thread__resolve_callchain()
calls __thread__resolve_callchain()
calls thread__resolve_callchain_sample():

        for (i = first_call, nr_entries = 0;
             i < chain_nr && nr_entries < max_stack; i++) {
...
                ip = chain->ips[j];
                if (ip < PERF_CONTEXT_MAX)   <-- IP=ff..ff is greater than PERF_CONTEXT_MAX
                       ++nr_entries;
...
                err = add_callchain_ip(thread, cursor, parent,
                                       root_al, &cpumode, ip,
                                       false, NULL, NULL, 0, symbols);

                if (err)
                        return (err < 0) ? err : 0;

calls add_callchain_ip:

               if (ip >= PERF_CONTEXT_MAX) {
                       switch (ip) {
                       case PERF_CONTEXT_HV:
                               *cpumode = PERF_RECORD_MISC_HYPERVISOR;
                               break;
                       case PERF_CONTEXT_KERNEL:
                               *cpumode = PERF_RECORD_MISC_KERNEL;
                               break;
                       case PERF_CONTEXT_USER:
                       case PERF_CONTEXT_USER_DEFERRED:
                               *cpumode = PERF_RECORD_MISC_USER;
                               break;
                       default:
                               pr_debug("invalid callchain context: "  <-- IP=ff..ff reaches default case
                                        "%"PRId64"\n", (s64) ip);
                               /*
                                * It seems the callchain is corrupted.
                                * Discard all.
                                */
                               callchain_cursor_reset(cursor);
                               err = 1;
                               goto out;
                       }

> +				pr_info("cannot resolve deferred callchains\n");
> +				cursor = NULL;
> +			}
> +		}
> +
> +		fputc(cursor ? '\n' : ' ', fp);
> +		sample__fprintf_sym(sample, &al, 0, output[type].print_ip_opts,
> +				    cursor, symbol_conf.bt_stop_list, fp);
> +	}
Thanks and regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
+49-7031-16-1128 Office
jremus@de.ibm.com

IBM

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-12-12 12:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-20 23:47 [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim
2025-11-20 23:47 ` [PATCH v6 1/6] tools headers UAPI: Sync linux/perf_event.h for deferred callchains Namhyung Kim
2025-11-20 23:48 ` [PATCH v6 2/6] perf tools: Minimal DEFERRED_CALLCHAIN support Namhyung Kim
2025-11-20 23:48 ` [PATCH v6 3/6] perf record: Add --call-graph fp,defer option for deferred callchains Namhyung Kim
2025-11-21  6:26   ` Thomas Richter
2025-11-24 20:27     ` Namhyung Kim
2025-12-03  5:49   ` Namhyung Kim
2025-11-20 23:48 ` [PATCH v6 4/6] perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Namhyung Kim
2025-12-12 12:11   ` Jens Remus
2025-11-20 23:48 ` [PATCH v6 5/6] perf tools: Merge deferred user callchains Namhyung Kim
2025-12-02 23:14   ` Ian Rogers
2025-12-03  0:01     ` Namhyung Kim
2025-12-12 11:16   ` Jens Remus
2025-12-12 11:48     ` Jens Remus
2025-11-20 23:48 ` [PATCH v6 6/6] perf tools: Flush remaining samples w/o deferred callchains Namhyung Kim
2025-12-02 23:15   ` Ian Rogers
2025-12-03 17:58 ` [PATCHSET v6 0/6] perf tools: Add deferred callchain support Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).