linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain
@ 2025-10-13 17:47 Tao Chen
  2025-10-13 17:47 ` [PATCH bpf-next RFC 1/2] perf: Use extern perf_callchain_entry for get_perf_callchain Tao Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tao Chen @ 2025-10-13 17:47 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, song, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo
  Cc: linux-perf-users, linux-kernel, bpf, Tao Chen

Background
==========
Alexei noted we should use preempt_disable to protect get_perf_callchain
in bpf stackmap.
https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com

A previous patch was submitted to attempt fixing this issue. And Andrii
suggested teach get_perf_callchain to let us pass that buffer directly to
avoid that unnecessary copy.
https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev

Proposed Solution
=================
Add external perf_callchain_entry parameter for get_perf_callchain to
allow us to use external buffer from BPF side. The biggest advantage is
that it can reduce unnecessary copies.

Todo
====
If the above changes are reasonable, it seems that get_callchain_entry_for_task
could also use an external perf_callchain_entry.

But I'm not sure if this modification is appropriate. After all, the
implementation of get_callchain_entry in the perf subsystem seems much more
complex than directly using an external buffer.

Comments and suggestions are always welcome.

Tao Chen (2):
  perf: Use extern perf_callchain_entry for get_perf_callchain
  bpf: Pass external callchain entry to get_perf_callchain

 include/linux/perf_event.h |  5 +++--
 kernel/bpf/stackmap.c      | 19 +++++++++++--------
 kernel/events/callchain.c  | 18 ++++++++++++------
 kernel/events/core.c       |  2 +-
 4 files changed, 27 insertions(+), 17 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf-next RFC 1/2] perf: Use extern perf_callchain_entry for get_perf_callchain
  2025-10-13 17:47 [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
@ 2025-10-13 17:47 ` Tao Chen
  2025-10-13 17:47 ` [PATCH bpf-next RFC 2/2] bpf: Pass external callchain entry to get_perf_callchain Tao Chen
  2025-10-13 20:41 ` [PATCH bpf-next RFC 0/2] " Jiri Olsa
  2 siblings, 0 replies; 5+ messages in thread
From: Tao Chen @ 2025-10-13 17:47 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, song, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo
  Cc: linux-perf-users, linux-kernel, bpf, Tao Chen

From bpf stack map, we want to use our own buffers to avoid unnecessary copy,
so let us pass it directly.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
---
 include/linux/perf_event.h |  5 +++--
 kernel/bpf/stackmap.c      |  4 ++--
 kernel/events/callchain.c  | 18 ++++++++++++------
 kernel/events/core.c       |  2 +-
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ec9d9602568..ca69ad2723c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1719,8 +1719,9 @@ DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
 extern void perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs);
 extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs);
 extern struct perf_callchain_entry *
-get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
-		   u32 max_stack, bool crosstask, bool add_mark);
+get_perf_callchain(struct pt_regs *regs, struct perf_callchain_entry *external_entry,
+		   u32 init_nr, bool kernel, bool user, u32 max_stack, bool crosstask,
+		   bool add_mark);
 extern int get_callchain_buffers(int max_stack);
 extern void put_callchain_buffers(void);
 extern struct perf_callchain_entry *get_callchain_entry(int *rctx);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 2e182a3ac4c..e6e40f22826 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -314,7 +314,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	if (max_depth > sysctl_perf_event_max_stack)
 		max_depth = sysctl_perf_event_max_stack;
 
-	trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
+	trace = get_perf_callchain(regs, NULL, 0, kernel, user, max_depth,
 				   false, false);
 
 	if (unlikely(!trace))
@@ -451,7 +451,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 	else if (kernel && task)
 		trace = get_callchain_entry_for_task(task, max_depth);
 	else
-		trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
+		trace = get_perf_callchain(regs, NULL, 0, kernel, user, max_depth,
 					   crosstask, false);
 
 	if (unlikely(!trace) || trace->nr < skip) {
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 6c83ad674d0..fe5d2d58deb 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -217,16 +217,21 @@ static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entr
 }
 
 struct perf_callchain_entry *
-get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
-		   u32 max_stack, bool crosstask, bool add_mark)
+get_perf_callchain(struct pt_regs *regs, struct perf_callchain_entry *external_entry,
+		   u32 init_nr, bool kernel, bool user, u32 max_stack, bool crosstask,
+		   bool add_mark)
 {
 	struct perf_callchain_entry *entry;
 	struct perf_callchain_entry_ctx ctx;
 	int rctx, start_entry_idx;
 
-	entry = get_callchain_entry(&rctx);
-	if (!entry)
-		return NULL;
+	if (external_entry) {
+		entry = external_entry;
+	} else {
+		entry = get_callchain_entry(&rctx);
+		if (!entry)
+			return NULL;
+	}
 
 	ctx.entry     = entry;
 	ctx.max_stack = max_stack;
@@ -262,7 +267,8 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
 	}
 
 exit_put:
-	put_callchain_entry(rctx);
+	if (!external_entry)
+		put_callchain_entry(rctx);
 
 	return entry;
 }
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1d354778dcd..08ce44db18f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8204,7 +8204,7 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
 	if (!kernel && !user)
 		return &__empty_callchain;
 
-	callchain = get_perf_callchain(regs, 0, kernel, user,
+	callchain = get_perf_callchain(regs, NULL, 0, kernel, user,
 				       max_stack, crosstask, true);
 	return callchain ?: &__empty_callchain;
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next RFC 2/2] bpf: Pass external callchain entry to get_perf_callchain
  2025-10-13 17:47 [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
  2025-10-13 17:47 ` [PATCH bpf-next RFC 1/2] perf: Use extern perf_callchain_entry for get_perf_callchain Tao Chen
@ 2025-10-13 17:47 ` Tao Chen
  2025-10-13 20:41 ` [PATCH bpf-next RFC 0/2] " Jiri Olsa
  2 siblings, 0 replies; 5+ messages in thread
From: Tao Chen @ 2025-10-13 17:47 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, song, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo
  Cc: linux-perf-users, linux-kernel, bpf, Tao Chen

As Alexei noted, get_perf_callchain() return values may be reused
if a task is preempted after the BPF program enters migrate disable
mode. Drawing on the per-cpu design of bpf_perf_callchain_entries,
stack-allocated memory of bpf_perf_callchain_entry is used here.

Signed-off-by: Tao Chen <chen.dylane@linux.dev>
---
 kernel/bpf/stackmap.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index e6e40f22826..1a51a2ea159 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -31,6 +31,11 @@ struct bpf_stack_map {
 	struct stack_map_bucket *buckets[] __counted_by(n_buckets);
 };
 
+struct bpf_perf_callchain_entry {
+	u64 nr;
+	u64 ip[PERF_MAX_STACK_DEPTH];
+};
+
 static inline bool stack_map_use_build_id(struct bpf_map *map)
 {
 	return (map->map_flags & BPF_F_STACK_BUILD_ID);
@@ -305,6 +310,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	bool user = flags & BPF_F_USER_STACK;
 	struct perf_callchain_entry *trace;
 	bool kernel = !user;
+	struct bpf_perf_callchain_entry entry = { 0 };
 
 	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
 			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
@@ -314,12 +320,8 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	if (max_depth > sysctl_perf_event_max_stack)
 		max_depth = sysctl_perf_event_max_stack;
 
-	trace = get_perf_callchain(regs, NULL, 0, kernel, user, max_depth,
-				   false, false);
-
-	if (unlikely(!trace))
-		/* couldn't fetch the stack trace */
-		return -EFAULT;
+	trace = get_perf_callchain(regs, (struct perf_callchain_entry *)&entry, 0,
+				   kernel, user, max_depth, false, false);
 
 	return __bpf_get_stackid(map, trace, flags);
 }
@@ -412,6 +414,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
 	bool user = flags & BPF_F_USER_STACK;
 	struct perf_callchain_entry *trace;
+	struct bpf_perf_callchain_entry entry = { 0 };
 	bool kernel = !user;
 	int err = -EINVAL;
 	u64 *ips;
@@ -451,8 +454,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 	else if (kernel && task)
 		trace = get_callchain_entry_for_task(task, max_depth);
 	else
-		trace = get_perf_callchain(regs, NULL, 0, kernel, user, max_depth,
-					   crosstask, false);
+		trace = get_perf_callchain(regs, (struct perf_callchain_entry *)&entry, 0,
+					   kernel, user, max_depth, crosstask, false);
 
 	if (unlikely(!trace) || trace->nr < skip) {
 		if (may_fault)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain
  2025-10-13 17:47 [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
  2025-10-13 17:47 ` [PATCH bpf-next RFC 1/2] perf: Use extern perf_callchain_entry for get_perf_callchain Tao Chen
  2025-10-13 17:47 ` [PATCH bpf-next RFC 2/2] bpf: Pass external callchain entry to get_perf_callchain Tao Chen
@ 2025-10-13 20:41 ` Jiri Olsa
  2025-10-13 21:37   ` Yonghong Song
  2 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2025-10-13 20:41 UTC (permalink / raw)
  To: Tao Chen
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	irogers, adrian.hunter, kan.liang, song, ast, daniel, andrii,
	martin.lau, eddyz87, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, linux-perf-users, linux-kernel, bpf

On Tue, Oct 14, 2025 at 01:47:19AM +0800, Tao Chen wrote:
> Background
> ==========
> Alexei noted we should use preempt_disable to protect get_perf_callchain
> in bpf stackmap.
> https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com
> 
> A previous patch was submitted to attempt fixing this issue. And Andrii
> suggested teach get_perf_callchain to let us pass that buffer directly to
> avoid that unnecessary copy.
> https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev
> 
> Proposed Solution
> =================
> Add external perf_callchain_entry parameter for get_perf_callchain to
> allow us to use external buffer from BPF side. The biggest advantage is
> that it can reduce unnecessary copies.
> 
> Todo
> ====
> If the above changes are reasonable, it seems that get_callchain_entry_for_task
> could also use an external perf_callchain_entry.
> 
> But I'm not sure if this modification is appropriate. After all, the
> implementation of get_callchain_entry in the perf subsystem seems much more
> complex than directly using an external buffer.
> 
> Comments and suggestions are always welcome.
> 
> Tao Chen (2):
>   perf: Use extern perf_callchain_entry for get_perf_callchain
>   bpf: Pass external callchain entry to get_perf_callchain

hi,
I can't get this applied on bpf-next/master, what do I miss?

thanks,
jirka


> 
>  include/linux/perf_event.h |  5 +++--
>  kernel/bpf/stackmap.c      | 19 +++++++++++--------
>  kernel/events/callchain.c  | 18 ++++++++++++------
>  kernel/events/core.c       |  2 +-
>  4 files changed, 27 insertions(+), 17 deletions(-)
> 
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain
  2025-10-13 20:41 ` [PATCH bpf-next RFC 0/2] " Jiri Olsa
@ 2025-10-13 21:37   ` Yonghong Song
  0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2025-10-13 21:37 UTC (permalink / raw)
  To: Jiri Olsa, Tao Chen
  Cc: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	irogers, adrian.hunter, kan.liang, song, ast, daniel, andrii,
	martin.lau, eddyz87, john.fastabend, kpsingh, sdf, haoluo,
	linux-perf-users, linux-kernel, bpf



On 10/13/25 1:41 PM, Jiri Olsa wrote:
> On Tue, Oct 14, 2025 at 01:47:19AM +0800, Tao Chen wrote:
>> Background
>> ==========
>> Alexei noted we should use preempt_disable to protect get_perf_callchain
>> in bpf stackmap.
>> https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com
>>
>> A previous patch was submitted to attempt fixing this issue. And Andrii
>> suggested teach get_perf_callchain to let us pass that buffer directly to
>> avoid that unnecessary copy.
>> https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev
>>
>> Proposed Solution
>> =================
>> Add external perf_callchain_entry parameter for get_perf_callchain to
>> allow us to use external buffer from BPF side. The biggest advantage is
>> that it can reduce unnecessary copies.
>>
>> Todo
>> ====
>> If the above changes are reasonable, it seems that get_callchain_entry_for_task
>> could also use an external perf_callchain_entry.
>>
>> But I'm not sure if this modification is appropriate. After all, the
>> implementation of get_callchain_entry in the perf subsystem seems much more
>> complex than directly using an external buffer.
>>
>> Comments and suggestions are always welcome.
>>
>> Tao Chen (2):
>>    perf: Use extern perf_callchain_entry for get_perf_callchain
>>    bpf: Pass external callchain entry to get_perf_callchain
> hi,
> I can't get this applied on bpf-next/master, what do I miss?

This path is not based on top of latest bpf/bpf-next tree.
The current diff:

  struct perf_callchain_entry *
-get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
-		   u32 max_stack, bool crosstask, bool add_mark)
+get_perf_callchain(struct pt_regs *regs, struct perf_callchain_entry *external_entry,
+		   u32 init_nr, bool kernel, bool user, u32 max_stack, bool crosstask,
+		   bool add_mark)
  {

The actual signature in kernel/events/callchain.c

struct perf_callchain_entry *
get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
                    u32 max_stack, bool crosstask, bool add_mark)
{


>
> thanks,
> jirka
>
>
>>   include/linux/perf_event.h |  5 +++--
>>   kernel/bpf/stackmap.c      | 19 +++++++++++--------
>>   kernel/events/callchain.c  | 18 ++++++++++++------
>>   kernel/events/core.c       |  2 +-
>>   4 files changed, 27 insertions(+), 17 deletions(-)
>>
>> -- 
>> 2.48.1
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-13 21:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 17:47 [PATCH bpf-next RFC 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
2025-10-13 17:47 ` [PATCH bpf-next RFC 1/2] perf: Use extern perf_callchain_entry for get_perf_callchain Tao Chen
2025-10-13 17:47 ` [PATCH bpf-next RFC 2/2] bpf: Pass external callchain entry to get_perf_callchain Tao Chen
2025-10-13 20:41 ` [PATCH bpf-next RFC 0/2] " Jiri Olsa
2025-10-13 21:37   ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).