public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v7 0/2] Pass external callchain entry to get_perf_callchain
@ 2025-12-17  5:12 Tao Chen
  2025-12-17  5:12 ` [PATCH bpf-next v7 1/2] perf: Refactor get_perf_callchain Tao Chen
  2025-12-17  5:12 ` [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until used completely Tao Chen
  0 siblings, 2 replies; 13+ messages in thread
From: Tao Chen @ 2025-12-17  5:12 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, song, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo
  Cc: linux-perf-users, linux-kernel, bpf, Tao Chen

Background
==========
Alexei noted we should use preempt_disable to protect get_perf_callchain
in bpf stackmap.
https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com

A previous patch was submitted to attempt fixing this issue. And Andrii
suggested teach get_perf_callchain to let us pass that buffer directly to
avoid that unnecessary copy.
https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev

Proposed Solution
=================
Add external perf_callchain_entry parameter for get_perf_callchain to
allow us to use external buffer from BPF side. The biggest advantage is
that it can reduce unnecessary copies.

Todo
====
But I'm not sure if this modification is appropriate. After all, the
implementation of get_callchain_entry in the perf subsystem seems much more
complex than directly using an external buffer.

Comments and suggestions are always welcome.

Change list:
 - v1 -> v2
   From Jiri
   - rebase code, fix conflict
 - v1: https://lore.kernel.org/bpf/20251013174721.2681091-1-chen.dylane@linux.dev
 
 - v2 -> v3:
   From Andrii
   - entries per CPU used in a stack-like fashion
 - v2: https://lore.kernel.org/bpf/20251014100128.2721104-1-chen.dylane@linux.dev

 - v3 -> v4:
   From Peter
   - refactor get_perf_callchain and add three new APIs to use perf
     callchain easily.
   From Andrii
   - reuse the perf callchain management.

   - rename patch1 and patch2.
 - v3: https://lore.kernel.org/bpf/20251019170118.2955346-1-chen.dylane@linux.dev
 
 - v4 -> v5:
   From Yonghong
   - keep add_mark false in stackmap when refactor get_perf_callchain in
     patch1.
   - add atomic operation in get_recursion_context in patch2.
   - rename bpf_put_callchain_entry with bpf_put_perf_callchain in
     patch3.
   - rebase bpf-next master.
 - v4: https://lore.kernel.org/bpf/20251028162502.3418817-1-chen.dylane@linux.dev

 - v5 -> v6:
   From Peter
   - disable preemption from BPF side in patch2.
   From AI
   - use ctx->entry->nr instead of ctx->nr in patch1.
 - v5: https://lore.kernel.org/bpf/20251109163559.4102849-1-chen.dylane@linux.dev

 - v6 -> v7:
   From yonghong
   - Add ack in patch2
 - v6: https://lore.kernel.org/bpf/20251112163148.100949-1-chen.dylane@linux.dev

Tao Chen (2):
  perf: Refactor get_perf_callchain
  bpf: Hold the perf callchain entry until used completely

 include/linux/perf_event.h |  9 +++++
 kernel/bpf/stackmap.c      | 67 +++++++++++++++++++++++++++-------
 kernel/events/callchain.c  | 73 ++++++++++++++++++++++++--------------
 3 files changed, 111 insertions(+), 38 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH RESEND bpf-next v7 0/2] Pass external callchain entry to get_perf_callchain
@ 2025-12-17  9:33 Tao Chen
  2025-12-17  9:33 ` [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until used completely Tao Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Tao Chen @ 2025-12-17  9:33 UTC (permalink / raw)
  To: peterz, mingo, acme, namhyung, mark.rutland, alexander.shishkin,
	jolsa, irogers, adrian.hunter, kan.liang, song, ast, daniel,
	andrii, martin.lau, eddyz87, yonghong.song, john.fastabend,
	kpsingh, sdf, haoluo
  Cc: linux-perf-users, linux-kernel, bpf, Tao Chen

Background
==========
Alexei noted we should use preempt_disable to protect get_perf_callchain
in bpf stackmap.
https://lore.kernel.org/bpf/CAADnVQ+s8B7-fvR1TNO-bniSyKv57cH_ihRszmZV7pQDyV=VDQ@mail.gmail.com

A previous patch was submitted to attempt fixing this issue. And Andrii
suggested teach get_perf_callchain to let us pass that buffer directly to
avoid that unnecessary copy.
https://lore.kernel.org/bpf/20250926153952.1661146-1-chen.dylane@linux.dev

Proposed Solution
=================
Add external perf_callchain_entry parameter for get_perf_callchain to
allow us to use external buffer from BPF side. The biggest advantage is
that it can reduce unnecessary copies.

Todo
====
But I'm not sure if this modification is appropriate. After all, the
implementation of get_callchain_entry in the perf subsystem seems much more
complex than directly using an external buffer.

Comments and suggestions are always welcome.

Change list:
 - v1 -> v2
   From Jiri
   - rebase code, fix conflict
 - v1: https://lore.kernel.org/bpf/20251013174721.2681091-1-chen.dylane@linux.dev

 - v2 -> v3:
   From Andrii
   - entries per CPU used in a stack-like fashion
 - v2: https://lore.kernel.org/bpf/20251014100128.2721104-1-chen.dylane@linux.dev

 - v3 -> v4:
   From Peter
   - refactor get_perf_callchain and add three new APIs to use perf
     callchain easily.
   From Andrii
   - reuse the perf callchain management.

   - rename patch1 and patch2.
 - v3: https://lore.kernel.org/bpf/20251019170118.2955346-1-chen.dylane@linux.dev

 - v4 -> v5:
   From Yonghong
   - keep add_mark false in stackmap when refactor get_perf_callchain in
     patch1.
   - add atomic operation in get_recursion_context in patch2.
   - rename bpf_put_callchain_entry with bpf_put_perf_callchain in
     patch3.
   - rebase bpf-next master.
 - v4: https://lore.kernel.org/bpf/20251028162502.3418817-1-chen.dylane@linux.dev

 - v5 -> v6:
   From Peter
   - disable preemption from BPF side in patch2.
   From AI
   - use ctx->entry->nr instead of ctx->nr in patch1.
 - v5: https://lore.kernel.org/bpf/20251109163559.4102849-1-chen.dylane@linux.dev

 - v6 -> v7:
   From yonghong
   - Add ack in patch2
   From AI
   - resolve conflict
 - v6: https://lore.kernel.org/bpf/20251112163148.100949-1-chen.dylane@linux.dev

Tao Chen (2):
  perf: Refactor get_perf_callchain
  bpf: Hold the perf callchain entry until used completely

 include/linux/perf_event.h | 10 ++++
 kernel/bpf/stackmap.c      | 68 +++++++++++++++++++++-----
 kernel/events/callchain.c  | 99 +++++++++++++++++++++++---------------
 3 files changed, 126 insertions(+), 51 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-01-23 18:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17  5:12 [PATCH bpf-next v7 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
2025-12-17  5:12 ` [PATCH bpf-next v7 1/2] perf: Refactor get_perf_callchain Tao Chen
2025-12-17  5:12 ` [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until used completely Tao Chen
2025-12-17  5:22   ` Tao Chen
2025-12-17  9:11     ` Tao Chen
  -- strict thread matches above, loose matches on Subject: below --
2025-12-17  9:33 [PATCH RESEND bpf-next v7 0/2] Pass external callchain entry to get_perf_callchain Tao Chen
2025-12-17  9:33 ` [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until used completely Tao Chen
2025-12-23  6:29   ` Tao Chen
2026-01-06 16:00     ` Tao Chen
2026-01-09 23:47       ` Andrii Nakryiko
2026-01-16  4:35         ` Tao Chen
2026-01-23  0:38   ` Andrii Nakryiko
2026-01-23  5:42     ` Tao Chen
2026-01-23 18:40       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox