From: Steven Rostedt <rostedt@kernel.org>
To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
bpf@vger.kernel.org, x86@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andrii Nakryiko <andrii@kernel.org>,
Indu Bhagat <indu.bhagat@oracle.com>,
"Jose E. Marchesi" <jemarch@gnu.org>,
Beau Belgrave <beaub@linux.microsoft.com>,
Jens Remus <jremus@linux.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Florian Weimer <fweimer@redhat.com>, Sam James <sam@gentoo.org>,
Kees Cook <kees@kernel.org>,
"Carlos O'Donell" <codonell@redhat.com>
Subject: [PATCH v16 3/4] perf: Have the deferred request record the user context cookie
Date: Tue, 07 Oct 2025 17:40:11 -0400 [thread overview]
Message-ID: <20251007214123.872765932@kernel.org> (raw)
In-Reply-To: 20251007214008.080852573@kernel.org
From: Steven Rostedt <rostedt@goodmis.org>
When a request to have a deferred unwind is made, have the cookie
associated to the user context recorded in the event that represents that
request. It is added after the PERF_CONTEXT_USER_DEFERRED in the
callchain. That perf context is a marker of where to add the associated
user space stack trace in the callchain. Adding the cookie after that
marker will not affect the appending of the callchain as it will be
overwritten by the user space stack in the perf tool.
The cookie will be used to match the cookie that is saved when the
deferred callchain is recorded. The perf tool will be able to use the
cooking saved at the request to know if the callchain that was recorded
when the task goes back to user space is for that event. If there were
dropped events after the request was made where it dropped the calltrace
that happened when the task went back to user space and then came back
into the kernel and a new request was dropped, but then the record started
again and it recorded a new callchain going back to user space, this
callchain would not be for the initial request. The cookie matching will
prevent this scenario from happening.
The cookie prevents:
record kernel stack trace with PERF_CONTEXT_USER_DEFERRED
[ dropped events starts here ]
record user stack trace - DROPPED
[enters user space ]
[exits user space back to the kernel ]
record kernel stack trace with PERF_CONTEXT_USER_DEFERRED - DROPPED!
[ events stop being dropped here ]
record user stack trace
Without a differentiating "cookie" identifier, the user space tool will
incorrectly attach the last recorded user stack trace to the first kernel
stack trace with the PERF_CONTEXT_USER_DEFERRED, as using the TID is not
enough to identify this situation.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
include/linux/perf_event.h | 2 +-
include/uapi/linux/perf_event.h | 5 +++++
kernel/bpf/stackmap.c | 4 ++--
kernel/events/callchain.c | 9 ++++++---
kernel/events/core.c | 12 ++++++------
tools/include/uapi/linux/perf_event.h | 5 +++++
6 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 152e3dacff98..a0f95f751b44 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1723,7 +1723,7 @@ extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct p
extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs);
extern struct perf_callchain_entry *
get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
- u32 max_stack, bool crosstask, bool add_mark, bool defer_user);
+ u32 max_stack, bool crosstask, bool add_mark, u64 defer_cookie);
extern int get_callchain_buffers(int max_stack);
extern void put_callchain_buffers(void);
extern struct perf_callchain_entry *get_callchain_entry(int *rctx);
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 20b8f890113b..79232e85a8fc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1282,6 +1282,11 @@ enum perf_bpf_event_type {
#define PERF_MAX_STACK_DEPTH 127
#define PERF_MAX_CONTEXTS_PER_STACK 8
+/*
+ * The PERF_CONTEXT_USER_DEFERRED has two items (context and cookie)
+ */
+#define PERF_DEFERRED_ITEMS 2
+
enum perf_callchain_context {
PERF_CONTEXT_HV = (__u64)-32,
PERF_CONTEXT_KERNEL = (__u64)-128,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 339f7cbbcf36..ef6021111fe3 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -315,7 +315,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
max_depth = sysctl_perf_event_max_stack;
trace = get_perf_callchain(regs, kernel, user, max_depth,
- false, false, false);
+ false, false, 0);
if (unlikely(!trace))
/* couldn't fetch the stack trace */
@@ -452,7 +452,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
trace = get_callchain_entry_for_task(task, max_depth);
else
trace = get_perf_callchain(regs, kernel, user, max_depth,
- crosstask, false, false);
+ crosstask, false, 0);
if (unlikely(!trace) || trace->nr < skip) {
if (may_fault)
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index d0e0da66a164..b9c7e00725d6 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -218,7 +218,7 @@ static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entr
struct perf_callchain_entry *
get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
- u32 max_stack, bool crosstask, bool add_mark, bool defer_user)
+ u32 max_stack, bool crosstask, bool add_mark, u64 defer_cookie)
{
struct perf_callchain_entry *entry;
struct perf_callchain_entry_ctx ctx;
@@ -251,12 +251,15 @@ get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
regs = task_pt_regs(current);
}
- if (defer_user) {
+ if (defer_cookie) {
/*
* Foretell the coming of PERF_RECORD_CALLCHAIN_DEFERRED
- * which can be stitched to this one.
+ * which can be stitched to this one, and add
+ * the cookie after it (it will be cut off when the
+ * user stack is copied to the callchain).
*/
perf_callchain_store_context(&ctx, PERF_CONTEXT_USER_DEFERRED);
+ perf_callchain_store_context(&ctx, defer_cookie);
goto exit_put;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index be94b437e7e0..f003a1f9497a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8262,10 +8262,9 @@ static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
* 0 : if it performed the queuing
* < 0 : if it did not get queued.
*/
-static int deferred_request(struct perf_event *event)
+static int deferred_request(struct perf_event *event, u64 *defer_cookie)
{
struct unwind_work *work = &event->unwind_work;
- u64 cookie;
/* Only defer for task events */
if (!event->ctx->task)
@@ -8275,7 +8274,7 @@ static int deferred_request(struct perf_event *event)
!user_mode(task_pt_regs(current)))
return -EINVAL;
- return unwind_deferred_request(work, &cookie);
+ return unwind_deferred_request(work, defer_cookie);
}
struct perf_callchain_entry *
@@ -8288,6 +8287,7 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
bool crosstask = event->ctx->task && event->ctx->task != current;
const u32 max_stack = event->attr.sample_max_stack;
struct perf_callchain_entry *callchain;
+ u64 defer_cookie = 0;
bool defer_user = IS_ENABLED(CONFIG_UNWIND_USER) && user &&
event->attr.defer_callchain;
@@ -8302,15 +8302,15 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
return &__empty_callchain;
if (defer_user) {
- int ret = deferred_request(event);
+ int ret = deferred_request(event, &defer_cookie);
if (!ret)
local_inc(&event->ctx->nr_no_switch_fast);
else if (ret < 0)
- defer_user = false;
+ defer_cookie = 0;
}
callchain = get_perf_callchain(regs, kernel, user, max_stack,
- crosstask, true, defer_user);
+ crosstask, true, defer_cookie);
return callchain ?: &__empty_callchain;
}
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 20b8f890113b..79232e85a8fc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1282,6 +1282,11 @@ enum perf_bpf_event_type {
#define PERF_MAX_STACK_DEPTH 127
#define PERF_MAX_CONTEXTS_PER_STACK 8
+/*
+ * The PERF_CONTEXT_USER_DEFERRED has two items (context and cookie)
+ */
+#define PERF_DEFERRED_ITEMS 2
+
enum perf_callchain_context {
PERF_CONTEXT_HV = (__u64)-32,
PERF_CONTEXT_KERNEL = (__u64)-128,
--
2.50.1
next prev parent reply other threads:[~2025-10-07 21:39 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-07 21:40 [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure Steven Rostedt
2025-10-07 21:40 ` [PATCH v16 1/4] unwind: Add interface to allow tracing a single task Steven Rostedt
2025-10-07 21:40 ` [PATCH v16 2/4] perf: Support deferred user callchains Steven Rostedt
2025-10-07 21:40 ` Steven Rostedt [this message]
2025-10-07 21:40 ` [PATCH v16 4/4] perf: Support deferred user callchains for per CPU events Steven Rostedt
2025-10-23 15:00 ` [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure Peter Zijlstra
2025-10-23 16:40 ` Steven Rostedt
2025-10-24 8:26 ` Peter Zijlstra
2025-10-24 12:58 ` Peter Zijlstra
2025-10-26 0:58 ` Namhyung Kim
2025-10-24 9:29 ` Peter Zijlstra
2025-10-24 10:41 ` Peter Zijlstra
2025-10-24 13:58 ` Jens Remus
2025-10-24 14:08 ` Peter Zijlstra
2025-10-24 14:51 ` Peter Zijlstra
2025-10-24 14:54 ` Peter Zijlstra
2025-10-24 14:57 ` Peter Zijlstra
2025-10-24 15:10 ` Peter Zijlstra
2025-10-24 15:09 ` Jens Remus
2025-10-24 15:11 ` Peter Zijlstra
2025-11-04 11:22 ` Florian Weimer
2025-11-05 13:08 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251007214123.872765932@kernel.org \
--to=rostedt@kernel.org \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=beaub@linux.microsoft.com \
--cc=bpf@vger.kernel.org \
--cc=codonell@redhat.com \
--cc=fweimer@redhat.com \
--cc=indu.bhagat@oracle.com \
--cc=jemarch@gnu.org \
--cc=jolsa@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=jremus@linux.ibm.com \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=sam@gentoo.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).