[PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs
@ 2026-04-10 17:09 Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

This series adds support for sleepable BPF programs attached to raw
tracepoints (tp_btf, raw_tp) and classic tracepoints (tp).
The motivation is to allow BPF programs on syscall
tracepoints to use sleepable helpers such as bpf_copy_from_user(),
enabling reliable user memory reads that can page-fault.

This series removes restriction for faultable tracepoints:

Patch 1 modifies __bpf_trace_run() to support sleepable programs.
Patch 2 introduce bpf_prog_run_array_sleepable() to support new usecase.

Patch 3 adds sleepable support for classic tracepoints
(BPF_PROG_TYPE_TRACEPOINT) by introducing trace_call_bpf_faultable()
and restructuring perf_syscall_enter/exit() to run BPF programs in
faultable context.

Patch 4 allows BPF_TRACE_RAW_TP, BPF_PROG_TYPE_RAW_TRACEPOINT, and
BPF_PROG_TYPE_TRACEPOINT programs to be loaded as sleepable, with
load-time and attach-time checks to reject sleepable programs on
non-faultable tracepoints.

Patch 5 adds libbpf SEC_DEF handlers: tp_btf.s, raw_tp.s,
raw_tracepoint.s, tp.s, and tracepoint.s.

Patch 6 adds selftests covering tp_btf.s, raw_tp.s, and tp.s positive
cases using bpf_copy_from_user() plus negative tests for non-faultable
tracepoints.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v9:
- Fixed "classic raw tracepoints" to "raw tracepoints (tp_btf, raw_tp)"
in commit message
- Added bpf_prog_get_recursion_context() guard to
__bpf_prog_test_run_raw_tp() to protect per-CPU private stack from
concurrent sleepable test runs
- Added new bpf_prog_run_array_sleepable() without is_uprobe parameter,
remove all changes in bpf_prog_run_array_uprobe()
- Refactored attach_tp() to use prefix array uniformly (matching
attach_raw_tp() pattern), removing hardcoded strcmp() bare-name checks.
- Recursion check in __bpf_prog_test_run_raw_tp()
- Refactored selftests
- Link to v8: https://patch.msgid.link/20260330-sleepable_tracepoints-v8-0-2e323467f3a0@meta.com

Changes in v8:
- Fix sleepable tracepoint support in bpf_prog_test_run() (Kumar, sashiko)
- Link to v7: https://patch.msgid.link/20260325-sleepable_tracepoints-v6-0-2b182dacea13@meta.com

Changes in v7:
- Add recursion check (bpf_prog_get_recursion_context()) to make sure
private stack is safe when sleepable program is preempted by itself
(Alexei, Kumar)
- Use combined rcu_read_lock_dont_migrate() instead of separate
rcu_read_lock()/migrate_disable() calls for non-sleepable path (Alexei)
- Link to v6: https://lore.kernel.org/bpf/20260324-sleepable_tracepoints-v6-0-81bab3a43f25@meta.com/

Changes in v6:
- Remove recursion check from trace_call_bpf_faultable(), sleepable
tracepoints are called from syscall enter/exit, no recursion is
possible.(Kumar)
- Refactor bpf_prog_run_array_uprobe() to support tracepoints
usecase cleanly (Kumar)
- Link to v5: https://lore.kernel.org/r/20260316-sleepable_tracepoints-v5-0-85525de71d25@meta.com

Changes in v5:
- Addressed AI review: zero initialize struct pt_regs in
perf_call_bpf_enter(); changed handling tp.s and tracepoint.s in
attach_tp() in libbpf.
- Updated commit messages
- Link to v4: https://lore.kernel.org/r/20260313-sleepable_tracepoints-v4-0-debc688a66b3@meta.com

Changes in v4:
- Follow uprobe_prog_run() pattern with explicit rcu_read_lock_trace()
  instead of relying on outer rcu_tasks_trace lock
- Add sleepable support for classic raw tracepoints (raw_tp.s)
- Add sleepable support for classic tracepoints (tp.s) with new
  trace_call_bpf_faultable() and restructured perf_syscall_enter/exit()
- Add raw_tp.s, raw_tracepoint.s, tp.s, tracepoint.s SEC_DEF handlers
- Replace growing type enumeration in error message with generic
  "program of this type cannot be sleepable"
- Use PT_REGS_PARM1_SYSCALL (non-CO-RE) in BTF test
- Add classic raw_tp and classic tracepoint sleepable tests
- Link to v3: https://lore.kernel.org/r/20260311-sleepable_tracepoints-v3-0-3e9bbde5bd22@meta.com

Changes in v3:
  - Moved faultable tracepoint check from attach time to load time in
    bpf_check_attach_target(), providing a clear verifier error message
  - Folded preempt_disable removal into the sleepable execution path
    patch
  - Used RUN_TESTS() with __failure/__msg for negative test case instead
    of explicit userspace program
  - Reduced series from 6 patches to 4
  - Link to v2: https://lore.kernel.org/r/20260225-sleepable_tracepoints-v2-0-0330dafd650f@meta.com

Changes in v2:
  - Address AI review points - modified the order of the patches
  - Link to v1: https://lore.kernel.org/bpf/20260218-sleepable_tracepoints-v1-0-ec2705497208@meta.com/

---
Mykyta Yatsenko (6):
      bpf: Add sleepable support for raw tracepoint programs
      bpf: Add bpf_prog_run_array_sleepable()
      bpf: Add sleepable support for classic tracepoint programs
      bpf: Verifier support for sleepable tracepoint programs
      libbpf: Add section handlers for sleepable tracepoints
      selftests/bpf: Add tests for sleepable tracepoint programs

 include/linux/bpf.h                                |  50 +++++++
 include/linux/trace_events.h                       |   6 +
 include/trace/bpf_probe.h                          |   2 -
 kernel/bpf/syscall.c                               |   5 +
 kernel/bpf/verifier.c                              |  13 +-
 kernel/events/core.c                               |   9 ++
 kernel/trace/bpf_trace.c                           |  48 ++++++-
 kernel/trace/trace_syscalls.c                      | 110 ++++++++-------
 net/bpf/test_run.c                                 |  65 +++++++--
 tools/lib/bpf/libbpf.c                             |  88 +++++++-----
 .../bpf/prog_tests/sleepable_tracepoints.c         | 154 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 125 +++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 14 files changed, 603 insertions(+), 107 deletions(-)
---
base-commit: 869997746f5b022e51e6fe5225dcfa25baf6a8bc
change-id: 20260216-sleepable_tracepoints-381ae1410550

Best regards,
--  
Mykyta Yatsenko <yatsenko@meta.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 1/6] bpf: Add sleepable support for raw tracepoint programs
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Rework __bpf_trace_run() to support sleepable BPF programs by using
explicit RCU flavor selection, following the uprobe_prog_run() pattern.

For sleepable programs, use rcu_read_lock_tasks_trace() for lifetime
protection with migrate_disable(). For non-sleepable programs, use the
regular rcu_read_lock_dont_migrate().

Remove the preempt_disable_notrace/preempt_enable_notrace pair from
the faultable tracepoint BPF probe wrapper in bpf_probe.h, since
migration protection and RCU locking are now handled per-program
inside __bpf_trace_run().

Adapt bpf_prog_test_run_raw_tp() for sleepable programs: select
RCU flavor per-program, add per-program recursion context guard for
private stack safety, and reject BPF_F_TEST_RUN_ON_CPU since
sleepable programs cannot run in hardirq or preempt-disabled context.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/trace/bpf_probe.h |  2 --
 kernel/trace/bpf_trace.c  | 20 ++++++++++++---
 net/bpf/test_run.c        | 65 ++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 9391d54d3f12..d1de8f9aa07f 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -58,9 +58,7 @@ static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
 	might_fault();							\
-	preempt_disable_notrace();					\
 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
-	preempt_enable_notrace();					\
 }
 
 #undef DECLARE_EVENT_SYSCALL_CLASS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index af7079aa0f36..4e763dd2aa2b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2072,11 +2072,19 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 static __always_inline
 void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_prog *prog = link->link.prog;
+	bool sleepable = prog->sleepable;
 	struct bpf_run_ctx *old_run_ctx;
 	struct bpf_trace_run_ctx run_ctx;
 
-	rcu_read_lock_dont_migrate();
+	if (sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock_dont_migrate();
+	}
+
 	if (unlikely(!bpf_prog_get_recursion_context(prog))) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
@@ -2085,12 +2093,18 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 	run_ctx.bpf_cookie = link->cookie;
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	(void) bpf_prog_run(prog, args);
+	(void)bpf_prog_run(prog, args);
 
 	bpf_reset_run_ctx(old_run_ctx);
 out:
 	bpf_prog_put_recursion_context(prog);
-	rcu_read_unlock_migrate();
+
+	if (sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock_migrate();
+	}
 }
 
 #define UNPACK(...)			__VA_ARGS__
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 4cd6b3ea1815..63427a333730 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -748,14 +748,35 @@ static void
 __bpf_prog_test_run_raw_tp(void *data)
 {
 	struct bpf_raw_tp_test_run_info *info = data;
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_trace_run_ctx run_ctx = {};
 	struct bpf_run_ctx *old_run_ctx;
 
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	rcu_read_lock();
+	if (info->prog->sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock();
+	}
+
+	if (unlikely(!bpf_prog_get_recursion_context(info->prog))) {
+		bpf_prog_inc_misses_counter(info->prog);
+		goto out;
+	}
+
 	info->retval = bpf_prog_run(info->prog, info->ctx);
-	rcu_read_unlock();
+
+out:
+	bpf_prog_put_recursion_context(info->prog);
+
+	if (info->prog->sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock();
+	}
 
 	bpf_reset_run_ctx(old_run_ctx);
 }
@@ -783,6 +804,13 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0)
 		return -EINVAL;
 
+	/*
+	 * Sleepable programs cannot run with preemption disabled or in
+	 * hardirq context (smp_call_function_single), reject the flag.
+	 */
+	if (prog->sleepable && (kattr->test.flags & BPF_F_TEST_RUN_ON_CPU))
+		return -EINVAL;
+
 	if (ctx_size_in) {
 		info.ctx = memdup_user(ctx_in, ctx_size_in);
 		if (IS_ERR(info.ctx))
@@ -791,24 +819,31 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		info.ctx = NULL;
 	}
 
+	info.retval = 0;
 	info.prog = prog;
 
-	current_cpu = get_cpu();
-	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
-	    cpu == current_cpu) {
+	if (prog->sleepable) {
 		__bpf_prog_test_run_raw_tp(&info);
-	} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		/* smp_call_function_single() also checks cpu_online()
-		 * after csd_lock(). However, since cpu is from user
-		 * space, let's do an extra quick check to filter out
-		 * invalid value before smp_call_function_single().
-		 */
-		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		current_cpu = get_cpu();
+		if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
+		    cpu == current_cpu) {
+			__bpf_prog_test_run_raw_tp(&info);
+		} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+			/*
+			 * smp_call_function_single() also checks cpu_online()
+			 * after csd_lock(). However, since cpu is from user
+			 * space, let's do an extra quick check to filter out
+			 * invalid value before smp_call_function_single().
+			 */
+			err = -ENXIO;
+		} else {
+			err = smp_call_function_single(cpu,
+						       __bpf_prog_test_run_raw_tp,
+						       &info, 1);
+		}
+		put_cpu();
 	}
-	put_cpu();
 
 	if (!err &&
 	    copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  2026-04-10 22:55   ` Alexei Starovoitov
  2026-04-10 17:09 ` [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add bpf_prog_run_array_sleepable() for running BPF program arrays
on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
includes per-program recursion checking for private stack safety
and hardcodes is_uprobe to false.

Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0136a108d083..4e166accab35 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
 void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
 
+static __always_inline u32
+bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
+			     const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	struct bpf_prog *prog;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
+	u32 ret = 1;
+
+	might_fault();
+	RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
+
+	if (unlikely(!array))
+		return ret;
+
+	migrate_disable();
+
+	run_ctx.is_uprobe = false;
+
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	item = &array->items[0];
+	while ((prog = READ_ONCE(item->prog))) {
+		if (!prog->sleepable)
+			rcu_read_lock();
+
+		/* Per-prog recursion check to enable private stack. */
+		if (unlikely(!bpf_prog_get_recursion_context(prog))) {
+			bpf_prog_inc_misses_counter(prog);
+			bpf_prog_put_recursion_context(prog);
+			if (!prog->sleepable)
+				rcu_read_unlock();
+			item++;
+			continue;
+		}
+
+		run_ctx.bpf_cookie = item->bpf_cookie;
+		ret &= run_prog(prog, ctx);
+
+		bpf_prog_put_recursion_context(prog);
+		item++;
+
+		if (!prog->sleepable)
+			rcu_read_unlock();
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	migrate_enable();
+	return ret;
+}
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  2026-04-10 19:39   ` Alexei Starovoitov
  2026-04-10 17:09 ` [PATCH bpf-next v9 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
faultable tracepoints that supports sleepable BPF programs. It uses
rcu_tasks_trace for lifetime protection and
bpf_prog_run_array_sleepable() for per-program RCU flavor selection,
following the uprobe_prog_run() pattern.

Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
programs before perf event processing. Previously, BPF ran after the
per-cpu perf trace buffer was allocated under preempt_disable,
requiring cleanup via perf_swevent_put_recursion_context() on filter.
Now BPF runs in faultable context before preempt_disable, reading
syscall arguments from local variables instead of the per-cpu trace
record, removing the dependency on buffer allocation. This allows
sleepable BPF programs to execute and avoids unnecessary buffer
allocation when BPF filters the event. The perf event submission
path (buffer allocation, fill, submit) remains under preempt_disable
as before. Since BPF no longer runs within the buffer allocation
context, the fake_regs output parameter to perf_trace_buf_alloc()
is no longer needed and is replaced with NULL.

Add an attach-time check in __perf_event_set_bpf_prog() to reject
sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
tracepoints, since only syscall tracepoints run in faultable context.

This prepares the classic tracepoint runtime and attach paths for
sleepable programs. The verifier changes to allow loading sleepable
BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.

To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/trace_events.h  |   6 +++
 kernel/events/core.c          |   9 ++++
 kernel/trace/bpf_trace.c      |  28 +++++++++++
 kernel/trace/trace_syscalls.c | 110 ++++++++++++++++++++++--------------------
 4 files changed, 101 insertions(+), 52 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 37eb2f0f3dd8..5fbbeb9ec4b9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -767,6 +767,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 
 #ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
@@ -789,6 +790,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
 	return 1;
 }
 
+static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	return 1;
+}
+
 static inline int
 perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
 {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 89b40e439717..054127525376 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11644,6 +11644,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
 		/* only uprobe programs are allowed to be sleepable */
 		return -EINVAL;
 
+	if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
+		/*
+		 * Sleepable tracepoint programs can only attach to faultable
+		 * tracepoints. Currently only syscall tracepoints are faultable.
+		 */
+		if (!is_syscall_tp)
+			return -EINVAL;
+	}
+
 	/* Kprobe override only works for kprobes, not uprobes. */
 	if (prog->kprobe_override && !is_kprobe)
 		return -EINVAL;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4e763dd2aa2b..487e56c04606 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -152,6 +152,34 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 	return ret;
 }
 
+/**
+ * trace_call_bpf_faultable - invoke BPF program in faultable context
+ * @call: tracepoint event
+ * @ctx: opaque context pointer
+ *
+ * Variant of trace_call_bpf() for faultable tracepoints (syscall
+ * tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
+ * for lifetime protection and bpf_prog_run_array_sleepable() for per-program
+ * RCU flavor selection, following the uprobe pattern.
+ *
+ * Recursion protection is unnecessary here (hence not touching bpf_prog_active
+ * compared to trace_call_bpf()), because syscall tracepoints fire only at
+ * syscall entry/exit boundaries, self-recursion is impossible.
+ *
+ * Must be called from a faultable/preemptible context.
+ */
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	struct bpf_prog_array *prog_array;
+
+	might_fault();
+	guard(rcu_tasks_trace)();
+
+	prog_array = rcu_dereference_check(call->prog_array,
+					   rcu_read_lock_trace_held());
+	return bpf_prog_run_array_sleepable(prog_array, ctx, bpf_prog_run);
+}
+
 #ifdef CONFIG_BPF_KPROBE_OVERRIDE
 BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
 {
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 37317b81fcda..d6e9c7d784e7 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -1372,33 +1372,33 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
-static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
+static int perf_call_bpf_enter(struct trace_event_call *call,
 			       struct syscall_metadata *sys_data,
-			       struct syscall_trace_enter *rec)
+			       int syscall_nr, unsigned long *args)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long args[SYSCALL_DEFINE_MAXARGS];
 	} __aligned(8) param;
+	struct pt_regs regs = {};
 	int i;
 
 	BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
 
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
 	for (i = 0; i < sys_data->nb_args; i++)
-		param.args[i] = rec->args[i];
-	return trace_call_bpf(call, &param);
+		param.args[i] = args[i];
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_enter *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	unsigned long args[6];
 	bool valid_prog_array;
@@ -1411,12 +1411,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	int size = 0;
 	int uargs = 0;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1430,6 +1425,26 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 
 	syscall_get_arguments(current, regs, args);
 
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
+	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
+	if (valid_prog_array &&
+	    !perf_call_bpf_enter(sys_data->enter_event, sys_data,
+				 syscall_nr, args))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->enter_event->perf_events);
+	if (hlist_empty(head))
+		return;
+
 	/* Check if this syscall event faults in user space memory */
 	mayfault = sys_data->user_mask != 0;
 
@@ -1439,17 +1454,12 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 			return;
 	}
 
-	head = this_cpu_ptr(sys_data->enter_event->perf_events);
-	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
-	if (!valid_prog_array && hlist_empty(head))
-		return;
-
 	/* get the size after alignment with the u32 buffer size field */
 	size += sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
@@ -1459,13 +1469,6 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	if (mayfault)
 		syscall_put_data(sys_data, rec, user_ptr, size, user_sizes, uargs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx,
 			      sys_data->enter_event->event.type, 1, regs,
 			      head, NULL);
@@ -1515,40 +1518,35 @@ static void perf_sysenter_disable(struct trace_event_call *call)
 		syscall_fault_buffer_disable();
 }
 
-static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
-			      struct syscall_trace_exit *rec)
+static int perf_call_bpf_exit(struct trace_event_call *call,
+			      int syscall_nr, long ret_val)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long ret;
 	} __aligned(8) param;
-
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
-	param.ret = rec->ret;
-	return trace_call_bpf(call, &param);
+	struct pt_regs regs = {};
+
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
+	param.ret = ret_val;
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_exit *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	bool valid_prog_array;
 	int syscall_nr;
 	int rctx;
 	int size;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1560,29 +1558,37 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
 	valid_prog_array = bpf_prog_array_valid(sys_data->exit_event);
-	if (!valid_prog_array && hlist_empty(head))
+	if (valid_prog_array &&
+	    !perf_call_bpf_exit(sys_data->exit_event, syscall_nr,
+				syscall_get_return_value(current, regs)))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	if (hlist_empty(head))
 		return;
 
 	/* We can probably do that at build time */
 	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_exit(sys_data->exit_event, fake_regs, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx, sys_data->exit_event->event.type,
 			      1, regs, head, NULL);
 }

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 4/6] bpf: Verifier support for sleepable tracepoint programs
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (2 preceding siblings ...)
  2026-04-10 17:09 ` [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  5 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Allow BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_TRACEPOINT, and
BPF_TRACE_RAW_TP (tp_btf) programs to be sleepable by adding them
to can_be_sleepable().

For BTF-based raw tracepoints (tp_btf), add a load-time check in
bpf_check_attach_target() that rejects sleepable programs attaching
to non-faultable tracepoints with a descriptive error message.

For raw tracepoints (raw_tp), add an attach-time check in
bpf_raw_tp_link_attach() that rejects sleepable programs on
non-faultable tracepoints. The attach-time check is needed because
the tracepoint name is not known at load time for raw_tp.

The attach-time check for classic tracepoints (tp) in
__perf_event_set_bpf_prog() was added in the previous patch.

Replace the verbose error message that enumerates allowed program
types with a generic "Program of this type cannot be sleepable"
message, since the list of sleepable-capable types keeps growing.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/syscall.c  |  5 +++++
 kernel/bpf/verifier.c | 13 +++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b73b25c63073..ce00e87431c8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4285,6 +4285,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 	if (!btp)
 		return -ENOENT;
 
+	if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+		bpf_put_raw_tracepoint(btp);
+		return -EINVAL;
+	}
+
 	link = kzalloc_obj(*link, GFP_USER);
 	if (!link) {
 		err = -ENOMEM;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9c1135d373e2..fcb497fc293a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -25753,6 +25753,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		btp = bpf_get_raw_tracepoint(tname);
 		if (!btp)
 			return -EINVAL;
+		if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+			bpf_log(log, "Sleepable program cannot attach to non-faultable tracepoint %s\n",
+				tname);
+			bpf_put_raw_tracepoint(btp);
+			return -EINVAL;
+		}
 		fname = kallsyms_lookup((unsigned long)btp->bpf_func, NULL, NULL, NULL,
 					trace_symbol);
 		bpf_put_raw_tracepoint(btp);
@@ -25969,6 +25975,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 		case BPF_MODIFY_RETURN:
 		case BPF_TRACE_ITER:
 		case BPF_TRACE_FSESSION:
+		case BPF_TRACE_RAW_TP:
 			return true;
 		default:
 			return false;
@@ -25976,7 +25983,9 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 	}
 	return prog->type == BPF_PROG_TYPE_LSM ||
 	       prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ ||
-	       prog->type == BPF_PROG_TYPE_STRUCT_OPS;
+	       prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
+	       prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT ||
+	       prog->type == BPF_PROG_TYPE_TRACEPOINT;
 }
 
 static int check_attach_btf_id(struct bpf_verifier_env *env)
@@ -25998,7 +26007,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	}
 
 	if (prog->sleepable && !can_be_sleepable(prog)) {
-		verbose(env, "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n");
+		verbose(env, "Program of this type cannot be sleepable\n");
 		return -EINVAL;
 	}
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 5/6] libbpf: Add section handlers for sleepable tracepoints
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (3 preceding siblings ...)
  2026-04-10 17:09 ` [PATCH bpf-next v9 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  2026-04-10 17:09 ` [PATCH bpf-next v9 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  5 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add SEC_DEF entries for sleepable tracepoint variants:
  - "tp_btf.s+"     for sleepable BTF-based raw tracepoints
  - "raw_tp.s+"     for sleepable raw tracepoints
  - "raw_tracepoint.s+" (alias)
  - "tp.s+"         for sleepable classic tracepoints
  - "tracepoint.s+" (alias)

Extract sec_name_match_prefix() to share the prefix matching logic
between attach_tp() and attach_raw_tp(), eliminating duplicated
loops and hardcoded strcmp() checks for bare section names.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/lib/bpf/libbpf.c | 88 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 31 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 42bdba4efd0c..5cb040a13490 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10009,11 +10009,16 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("netkit/peer",		SCHED_CLS, BPF_NETKIT_PEER, SEC_NONE),
 	SEC_DEF("tracepoint+",		TRACEPOINT, 0, SEC_NONE, attach_tp),
 	SEC_DEF("tp+",			TRACEPOINT, 0, SEC_NONE, attach_tp),
+	SEC_DEF("tracepoint.s+",	TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
+	SEC_DEF("tp.s+",		TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
 	SEC_DEF("raw_tracepoint+",	RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp+",		RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
+	SEC_DEF("raw_tracepoint.s+",	RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
+	SEC_DEF("raw_tp.s+",		RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
 	SEC_DEF("raw_tracepoint.w+",	RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp.w+",		RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("tp_btf+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF, attach_trace),
+	SEC_DEF("tp_btf.s+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fentry+",		TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fmod_ret+",		TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fexit+",		TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF, attach_trace),
@@ -13136,25 +13141,61 @@ struct bpf_link *bpf_program__attach_tracepoint(const struct bpf_program *prog,
 	return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL);
 }
 
+/*
+ * Match section name against a prefix array. Returns pointer past
+ * "prefix/" on match, empty string for bare sections (exact prefix
+ * match), or NULL if no prefix matches.
+ */
+static const char *sec_name_match_prefix(const char *sec_name,
+					 const char *const *prefixes,
+					 size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		size_t pfx_len;
+
+		if (!str_has_pfx(sec_name, prefixes[i]))
+			continue;
+
+		pfx_len = strlen(prefixes[i]);
+		if (sec_name[pfx_len] == '\0')
+			return sec_name + pfx_len;
+
+		if (sec_name[pfx_len] != '/' || sec_name[pfx_len + 1] == '\0')
+			continue;
+
+		return sec_name + pfx_len + 1;
+	}
+	return NULL;
+}
+
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link)
 {
+	static const char *const prefixes[] = {
+		"tp.s",
+		"tp",
+		"tracepoint.s",
+		"tracepoint",
+	};
 	char *sec_name, *tp_cat, *tp_name;
+	const char *match;
 
 	*link = NULL;
 
-	/* no auto-attach for SEC("tp") or SEC("tracepoint") */
-	if (strcmp(prog->sec_name, "tp") == 0 || strcmp(prog->sec_name, "tracepoint") == 0)
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
+		return -EINVAL;
+	}
+	if (!match[0]) /* bare section name no autoattach */
 		return 0;
 
 	sec_name = strdup(prog->sec_name);
 	if (!sec_name)
 		return -ENOMEM;
 
-	/* extract "tp/<category>/<name>" or "tracepoint/<category>/<name>" */
-	if (str_has_pfx(prog->sec_name, "tp/"))
-		tp_cat = sec_name + sizeof("tp/") - 1;
-	else
-		tp_cat = sec_name + sizeof("tracepoint/") - 1;
+	tp_cat = sec_name + (match - prog->sec_name);
 	tp_name = strchr(tp_cat, '/');
 	if (!tp_name) {
 		free(sec_name);
@@ -13218,37 +13259,22 @@ static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf
 		"raw_tracepoint",
 		"raw_tp.w",
 		"raw_tracepoint.w",
+		"raw_tp.s",
+		"raw_tracepoint.s",
 	};
-	size_t i;
-	const char *tp_name = NULL;
+	const char *match;
 
 	*link = NULL;
 
-	for (i = 0; i < ARRAY_SIZE(prefixes); i++) {
-		size_t pfx_len;
-
-		if (!str_has_pfx(prog->sec_name, prefixes[i]))
-			continue;
-
-		pfx_len = strlen(prefixes[i]);
-		/* no auto-attach case of, e.g., SEC("raw_tp") */
-		if (prog->sec_name[pfx_len] == '\0')
-			return 0;
-
-		if (prog->sec_name[pfx_len] != '/')
-			continue;
-
-		tp_name = prog->sec_name + pfx_len + 1;
-		break;
-	}
-
-	if (!tp_name) {
-		pr_warn("prog '%s': invalid section name '%s'\n",
-			prog->name, prog->sec_name);
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
 		return -EINVAL;
 	}
+	if (!match[0])
+		return 0;
 
-	*link = bpf_program__attach_raw_tracepoint(prog, tp_name);
+	*link = bpf_program__attach_raw_tracepoint(prog, match);
 	return libbpf_get_error(*link);
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v9 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (4 preceding siblings ...)
  2026-04-10 17:09 ` [PATCH bpf-next v9 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
@ 2026-04-10 17:09 ` Mykyta Yatsenko
  5 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-10 17:09 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Cover all three sleepable tracepoint types (tp_btf.s, raw_tp.s, tp.s)
and sys_exit (via bpf_task_pt_regs) with functional tests using
bpf_copy_from_user() on nanosleep. Verify alias and bare SEC variants,
bpf_prog_test_run_raw_tp() with BPF_F_TEST_RUN_ON_CPU rejection,
attach-time rejection on non-faultable tracepoints, and load-time
rejection for sleepable tp_btf on non-faultable tracepoints.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 .../bpf/prog_tests/sleepable_tracepoints.c         | 154 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 125 +++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 4 files changed, 312 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
new file mode 100644
index 000000000000..cd2b0e916fab
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <time.h>
+#include "test_sleepable_tracepoints.skel.h"
+#include "test_sleepable_tracepoints_fail.skel.h"
+
+static void run_test(struct test_sleepable_tracepoints *skel)
+{
+	skel->bss->target_pid = getpid();
+	skel->bss->prog_triggered = 0;
+	skel->bss->err = 0;
+	skel->bss->copied_tv_nsec = 0;
+
+	syscall(__NR_nanosleep, &(struct timespec){ .tv_nsec = 555 }, NULL);
+
+	ASSERT_EQ(skel->bss->prog_triggered, 1, "prog_triggered");
+	ASSERT_EQ(skel->bss->err, 0, "err");
+	ASSERT_EQ(skel->bss->copied_tv_nsec, 555, "copied_tv_nsec");
+}
+
+static void run_auto_attach_test(struct bpf_program *prog, struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		return;
+
+	run_test(skel);
+	bpf_link__destroy(link);
+}
+
+void test_sleepable_tracepoints(void)
+{
+	struct test_sleepable_tracepoints *skel;
+	struct bpf_link *link;
+	int err, i;
+
+	skel = test_sleepable_tracepoints__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	/* Primary functional tests: full bpf_copy_from_user exercise */
+	{
+		struct {
+			const char *name;
+			struct bpf_program *prog;
+		} func_tests[] = {
+			{ "tp_btf", skel->progs.handle_sys_enter_tp_btf },
+			{ "raw_tp", skel->progs.handle_sys_enter_raw_tp },
+			{ "tracepoint", skel->progs.handle_sys_enter_tp },
+			{ "sys_exit", skel->progs.handle_sys_exit_tp },
+		};
+
+		for (i = 0; i < ARRAY_SIZE(func_tests); i++) {
+			if (test__start_subtest(func_tests[i].name))
+				run_auto_attach_test(func_tests[i].prog, skel);
+		}
+	}
+
+	/* Attach-only tests: verify libbpf prefix parsing for aliases */
+	{
+		struct {
+			const char *name;
+			struct bpf_program *prog;
+		} attach_tests[] = {
+			{ "tracepoint_alias", skel->progs.handle_sys_enter_tp_alias },
+			{ "raw_tracepoint_alias", skel->progs.handle_sys_enter_raw_tp_alias },
+		};
+
+		for (i = 0; i < ARRAY_SIZE(attach_tests); i++) {
+			if (!test__start_subtest(attach_tests[i].name))
+				continue;
+			link = bpf_program__attach(attach_tests[i].prog);
+			if (ASSERT_OK_PTR(link, "attach"))
+				bpf_link__destroy(link);
+		}
+	}
+
+	/* Bare SEC variants: verify manual attach */
+
+	if (test__start_subtest("raw_tp_bare")) {
+		link = bpf_program__attach_raw_tracepoint(skel->progs.handle_raw_tp_bare,
+							  "sys_enter");
+		if (ASSERT_OK_PTR(link, "raw_tp_bare_attach"))
+			bpf_link__destroy(link);
+	}
+
+	if (test__start_subtest("tp_bare")) {
+		link = bpf_program__attach_tracepoint(skel->progs.handle_tp_bare, "syscalls",
+						      "sys_enter_nanosleep");
+		if (ASSERT_OK_PTR(link, "tp_bare_attach"))
+			bpf_link__destroy(link);
+	}
+
+	/* BPF_PROG_TEST_RUN: exercise bpf_prog_test_run_raw_tp() */
+	{
+		struct {
+			const char *name;
+			__u32 flags;
+			bool expect_err;
+		} run_tests[] = {
+			{ "test_run", 0, false },
+			{ "test_run_on_cpu_reject", BPF_F_TEST_RUN_ON_CPU, true },
+		};
+
+		for (i = 0; i < ARRAY_SIZE(run_tests); i++) {
+			__u64 args[2] = {0x1234ULL, 0x5678ULL};
+			LIBBPF_OPTS(bpf_test_run_opts, topts,
+				.ctx_in = args,
+				.ctx_size_in = sizeof(args),
+				.flags = run_tests[i].flags,
+			);
+			int fd;
+
+			if (!test__start_subtest(run_tests[i].name))
+				continue;
+
+			fd = bpf_program__fd(skel->progs.handle_test_run);
+			err = bpf_prog_test_run_opts(fd, &topts);
+			if (!run_tests[i].expect_err) {
+				ASSERT_OK(err, "test_run");
+				ASSERT_EQ(topts.retval, args[0] + args[1], "test_run_retval");
+			} else {
+				ASSERT_ERR(err, "test_run_err");
+			}
+		}
+	}
+
+	/* Negative: attach-time rejection on non-faultable tracepoints */
+	{
+		struct {
+			const char *name;
+			struct bpf_program *prog;
+		} neg_tests[] = {
+			{ "raw_tp_non_faultable", skel->progs.handle_raw_tp_non_faultable },
+			{ "tp_non_syscall", skel->progs.handle_tp_non_syscall },
+		};
+
+		for (i = 0; i < ARRAY_SIZE(neg_tests); i++) {
+			if (!test__start_subtest(neg_tests[i].name))
+				continue;
+			link = bpf_program__attach(neg_tests[i].prog);
+			ASSERT_ERR_PTR(link, "attach_should_fail");
+		}
+	}
+
+	test_sleepable_tracepoints__destroy(skel);
+
+	/* Negative: load-time rejection (separate BPF object) */
+	RUN_TESTS(test_sleepable_tracepoints_fail);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
new file mode 100644
index 000000000000..907c04510a72
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <asm/unistd.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+int target_pid;
+int prog_triggered;
+long err;
+long copied_tv_nsec;
+
+static int copy_nanosleep_arg(struct __kernel_timespec *ts)
+{
+	long tv_nsec;
+
+	err = bpf_copy_from_user(&tv_nsec, sizeof(tv_nsec), &ts->tv_nsec);
+	if (err)
+		return err;
+
+	copied_tv_nsec = tv_nsec;
+	prog_triggered = 1;
+	return 0;
+}
+
+/* Primary functional tests: full bpf_copy_from_user exercise */
+
+SEC("tp_btf.s/sys_enter")
+int BPF_PROG(handle_sys_enter_tp_btf, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_nanosleep)
+		return 0;
+
+	return copy_nanosleep_arg((void *)PT_REGS_PARM1_SYSCALL(regs));
+}
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_nanosleep)
+		return 0;
+
+	return copy_nanosleep_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+SEC("tp.s/syscalls/sys_enter_nanosleep")
+int handle_sys_enter_tp(struct syscall_trace_enter *args)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	return copy_nanosleep_arg((void *)args->args[0]);
+}
+
+SEC("tp.s/syscalls/sys_exit_nanosleep")
+int handle_sys_exit_tp(struct syscall_trace_exit *args)
+{
+	struct pt_regs *regs;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	regs = (struct pt_regs *)bpf_task_pt_regs(bpf_get_current_task_btf());
+	return copy_nanosleep_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+/* Bare SEC variants: test manual attach without tracepoint in section name */
+
+SEC("raw_tp.s")
+int BPF_PROG(handle_raw_tp_bare, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+SEC("tp.s")
+int handle_tp_bare(void *ctx)
+{
+	return 0;
+}
+
+/* Alias SEC variants: test libbpf prefix parsing for long-form names */
+
+SEC("tracepoint.s/syscalls/sys_enter_nanosleep")
+int handle_sys_enter_tp_alias(struct syscall_trace_enter *args)
+{
+	return 0;
+}
+
+SEC("raw_tracepoint.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp_alias, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+/* BPF_PROG_TEST_RUN: sleepable raw_tp invoked via bpf_prog_test_run_raw_tp */
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_test_run, struct pt_regs *regs, long id)
+{
+	if ((__u64)regs == 0x1234ULL && (__u64)id == 0x5678ULL)
+		return (__u64)regs + (__u64)id;
+
+	return 0;
+}
+
+/* Negative: sleepable on non-faultable tracepoint (attach-time rejection) */
+
+SEC("raw_tp.s/sched_switch")
+int BPF_PROG(handle_raw_tp_non_faultable, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
+
+SEC("tp.s/sched/sched_switch")
+int handle_tp_non_syscall(void *ctx)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
new file mode 100644
index 000000000000..1a0748a9520b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Sleepable program on a non-faultable tracepoint should fail to load */
+SEC("tp_btf.s/sched_switch")
+__failure __msg("Sleepable program cannot attach to non-faultable tracepoint")
+int BPF_PROG(handle_sched_switch, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/sleepable.c b/tools/testing/selftests/bpf/verifier/sleepable.c
index 1f0d2bdc673f..6dabc5522945 100644
--- a/tools/testing/selftests/bpf/verifier/sleepable.c
+++ b/tools/testing/selftests/bpf/verifier/sleepable.c
@@ -76,7 +76,20 @@
 	.runs = -1,
 },
 {
-	"sleepable raw tracepoint reject",
+	"sleepable raw tracepoint accept",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACING,
+	.expected_attach_type = BPF_TRACE_RAW_TP,
+	.kfunc = "sys_enter",
+	.result = ACCEPT,
+	.flags = BPF_F_SLEEPABLE,
+	.runs = -1,
+},
+{
+	"sleepable raw tracepoint reject non-faultable",
 	.insns = {
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
@@ -85,7 +98,7 @@
 	.expected_attach_type = BPF_TRACE_RAW_TP,
 	.kfunc = "sched_switch",
 	.result = REJECT,
-	.errstr = "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable",
+	.errstr = "Sleepable program cannot attach to non-faultable tracepoint",
 	.flags = BPF_F_SLEEPABLE,
 	.runs = -1,
 },

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-10 17:09 ` [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-10 19:39   ` Alexei Starovoitov
  0 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2026-04-10 19:39 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko

On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
> faultable tracepoints that supports sleepable BPF programs. It uses
> rcu_tasks_trace for lifetime protection and
> bpf_prog_run_array_sleepable() for per-program RCU flavor selection,
> following the uprobe_prog_run() pattern.
>
> Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
> programs before perf event processing. Previously, BPF ran after the
> per-cpu perf trace buffer was allocated under preempt_disable,
> requiring cleanup via perf_swevent_put_recursion_context() on filter.
> Now BPF runs in faultable context before preempt_disable, reading
> syscall arguments from local variables instead of the per-cpu trace
> record, removing the dependency on buffer allocation. This allows
> sleepable BPF programs to execute and avoids unnecessary buffer
> allocation when BPF filters the event. The perf event submission
> path (buffer allocation, fill, submit) remains under preempt_disable
> as before. Since BPF no longer runs within the buffer allocation
> context, the fake_regs output parameter to perf_trace_buf_alloc()
> is no longer needed and is replaced with NULL.
>
> Add an attach-time check in __perf_event_set_bpf_prog() to reject
> sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
> tracepoints, since only syscall tracepoints run in faultable context.
>
> This prepares the classic tracepoint runtime and attach paths for
> sleepable programs. The verifier changes to allow loading sleepable
> BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.
>
> To: Peter Zijlstra <peterz@infradead.org>
> To: Steven Rostedt <rostedt@goodmis.org>
> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
>  include/linux/trace_events.h  |   6 +++
>  kernel/events/core.c          |   9 ++++
>  kernel/trace/bpf_trace.c      |  28 +++++++++++
>  kernel/trace/trace_syscalls.c | 110 ++++++++++++++++++++++--------------------
>  4 files changed, 101 insertions(+), 52 deletions(-)

Peter,

I'm taking it into bpf-next, since the refactor is minor and looks
correct to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-10 17:09 ` [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
@ 2026-04-10 22:55   ` Alexei Starovoitov
  2026-04-13 12:55     ` Mykyta Yatsenko
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2026-04-10 22:55 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko

On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> From: Mykyta Yatsenko <yatsenko@meta.com>
>
> Add bpf_prog_run_array_sleepable() for running BPF program arrays
> on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
> includes per-program recursion checking for private stack safety
> and hardcodes is_uprobe to false.
>
> Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.
>
> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> ---
>  include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 0136a108d083..4e166accab35 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>  void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>
> +static __always_inline u32
> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
> +                            const void *ctx, bpf_prog_run_fn run_prog)
> +{
> +       const struct bpf_prog_array_item *item;
> +       struct bpf_prog *prog;
> +       struct bpf_run_ctx *old_run_ctx;
> +       struct bpf_trace_run_ctx run_ctx;
> +       u32 ret = 1;
> +
> +       might_fault();
> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> +
> +       if (unlikely(!array))
> +               return ret;
> +
> +       migrate_disable();
> +
> +       run_ctx.is_uprobe = false;
> +
> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +       item = &array->items[0];
> +       while ((prog = READ_ONCE(item->prog))) {
> +               if (!prog->sleepable)
> +                       rcu_read_lock();
> +
> +               /* Per-prog recursion check to enable private stack. */
> +               if (unlikely(!bpf_prog_get_recursion_context(prog))) {

from sashiko

> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> + item = &array->items[0];
> + while ((prog = READ_ONCE(item->prog))) {
> + if (!prog->sleepable)
> + rcu_read_lock();
> +
> + /* Per-prog recursion check to enable private stack. */
> + if (unlikely(!bpf_prog_get_recursion_context(prog))) {

Can this cause a panic by dereferencing dummy_bpf_prog.prog.active?
When a program is detached from a BPF array and memory allocation for the new
array fails, bpf_prog_array_delete_safe() replaces the detached program with
&dummy_bpf_prog.prog as a fallback.
Because dummy_bpf_prog is a statically allocated placeholder, its prog.active
field is uninitialized (NULL).
If prog is dummy_bpf_prog.prog, bpf_prog_get_recursion_context() will
dereference prog->active as a per-CPU pointer, accessing offset 0 in the
per-CPU area and causing memory corruption or a panic.
Should there be an explicit check to skip dummy_bpf_prog.prog here?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-10 22:55   ` Alexei Starovoitov
@ 2026-04-13 12:55     ` Mykyta Yatsenko
  2026-04-13 16:25       ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 12:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko



On 4/10/26 11:55 PM, Alexei Starovoitov wrote:
> On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Add bpf_prog_run_array_sleepable() for running BPF program arrays
>> on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
>> includes per-program recursion checking for private stack safety
>> and hardcodes is_uprobe to false.
>>
>> Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.
>>
>> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>>   include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 50 insertions(+)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 0136a108d083..4e166accab35 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>>   void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>>   void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>>
>> +static __always_inline u32
>> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
>> +                            const void *ctx, bpf_prog_run_fn run_prog)
>> +{
>> +       const struct bpf_prog_array_item *item;
>> +       struct bpf_prog *prog;
>> +       struct bpf_run_ctx *old_run_ctx;
>> +       struct bpf_trace_run_ctx run_ctx;
>> +       u32 ret = 1;
>> +
>> +       might_fault();
>> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
>> +
>> +       if (unlikely(!array))
>> +               return ret;
>> +
>> +       migrate_disable();
>> +
>> +       run_ctx.is_uprobe = false;
>> +
>> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>> +       item = &array->items[0];
>> +       while ((prog = READ_ONCE(item->prog))) {
>> +               if (!prog->sleepable)
>> +                       rcu_read_lock();
>> +
>> +               /* Per-prog recursion check to enable private stack. */
>> +               if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> 
> from sashiko
> 
>> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>> + item = &array->items[0];
>> + while ((prog = READ_ONCE(item->prog))) {
>> + if (!prog->sleepable)
>> + rcu_read_lock();
>> +
>> + /* Per-prog recursion check to enable private stack. */
>> + if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> 
> Can this cause a panic by dereferencing dummy_bpf_prog.prog.active?
> When a program is detached from a BPF array and memory allocation for the new
> array fails, bpf_prog_array_delete_safe() replaces the detached program with
> &dummy_bpf_prog.prog as a fallback.
> Because dummy_bpf_prog is a statically allocated placeholder, its prog.active
> field is uninitialized (NULL).
> If prog is dummy_bpf_prog.prog, bpf_prog_get_recursion_context() will
> dereference prog->active as a per-CPU pointer, accessing offset 0 in the
> per-CPU area and causing memory corruption or a panic.
> Should there be an explicit check to skip dummy_bpf_prog.prog here?

Looks like a real issue, thanks. I think the best solution is to add 
valid `prog->active` field for the dummy_bpf_prog.prog so we don't need 
to maintain special branches and can rely on the being a valid bpf_prog.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-13 12:55     ` Mykyta Yatsenko
@ 2026-04-13 16:25       ` Alexei Starovoitov
  2026-04-13 17:32         ` Mykyta Yatsenko
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2026-04-13 16:25 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko

On Mon, Apr 13, 2026 at 5:55 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
>
>
> On 4/10/26 11:55 PM, Alexei Starovoitov wrote:
> > On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
> > <mykyta.yatsenko5@gmail.com> wrote:
> >>
> >> From: Mykyta Yatsenko <yatsenko@meta.com>
> >>
> >> Add bpf_prog_run_array_sleepable() for running BPF program arrays
> >> on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
> >> includes per-program recursion checking for private stack safety
> >> and hardcodes is_uprobe to false.
> >>
> >> Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.
> >>
> >> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> >> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> >> ---
> >>   include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 50 insertions(+)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index 0136a108d083..4e166accab35 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> >>   void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
> >>   void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
> >>
> >> +static __always_inline u32
> >> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
> >> +                            const void *ctx, bpf_prog_run_fn run_prog)
> >> +{
> >> +       const struct bpf_prog_array_item *item;
> >> +       struct bpf_prog *prog;
> >> +       struct bpf_run_ctx *old_run_ctx;
> >> +       struct bpf_trace_run_ctx run_ctx;
> >> +       u32 ret = 1;
> >> +
> >> +       might_fault();
> >> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> >> +
> >> +       if (unlikely(!array))
> >> +               return ret;
> >> +
> >> +       migrate_disable();
> >> +
> >> +       run_ctx.is_uprobe = false;
> >> +
> >> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >> +       item = &array->items[0];
> >> +       while ((prog = READ_ONCE(item->prog))) {
> >> +               if (!prog->sleepable)
> >> +                       rcu_read_lock();
> >> +
> >> +               /* Per-prog recursion check to enable private stack. */
> >> +               if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> >
> > from sashiko
> >
> >> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >> + item = &array->items[0];
> >> + while ((prog = READ_ONCE(item->prog))) {
> >> + if (!prog->sleepable)
> >> + rcu_read_lock();
> >> +
> >> + /* Per-prog recursion check to enable private stack. */
> >> + if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> >
> > Can this cause a panic by dereferencing dummy_bpf_prog.prog.active?
> > When a program is detached from a BPF array and memory allocation for the new
> > array fails, bpf_prog_array_delete_safe() replaces the detached program with
> > &dummy_bpf_prog.prog as a fallback.
> > Because dummy_bpf_prog is a statically allocated placeholder, its prog.active
> > field is uninitialized (NULL).
> > If prog is dummy_bpf_prog.prog, bpf_prog_get_recursion_context() will
> > dereference prog->active as a per-CPU pointer, accessing offset 0 in the
> > per-CPU area and causing memory corruption or a panic.
> > Should there be an explicit check to skip dummy_bpf_prog.prog here?
>
> Looks like a real issue, thanks. I think the best solution is to add
> valid `prog->active` field for the dummy_bpf_prog.prog so we don't need
> to maintain special branches and can rely on the being a valid bpf_prog.

No. Don't copy paste from claude. Ask it to do the home work.
This was already discussed on the list.
I mean 'oh it's NULL let's add it'. It was similar issue elsewhere
and solved differently.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-13 16:25       ` Alexei Starovoitov
@ 2026-04-13 17:32         ` Mykyta Yatsenko
  2026-04-13 20:05           ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-13 17:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko



On 4/13/26 5:25 PM, Alexei Starovoitov wrote:
> On Mon, Apr 13, 2026 at 5:55 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>>
>>
>> On 4/10/26 11:55 PM, Alexei Starovoitov wrote:
>>> On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
>>> <mykyta.yatsenko5@gmail.com> wrote:
>>>>
>>>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>>>
>>>> Add bpf_prog_run_array_sleepable() for running BPF program arrays
>>>> on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
>>>> includes per-program recursion checking for private stack safety
>>>> and hardcodes is_uprobe to false.
>>>>
>>>> Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.
>>>>
>>>> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>>>> ---
>>>>    include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 50 insertions(+)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index 0136a108d083..4e166accab35 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>>>>    void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>>>>    void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>>>>
>>>> +static __always_inline u32
>>>> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
>>>> +                            const void *ctx, bpf_prog_run_fn run_prog)
>>>> +{
>>>> +       const struct bpf_prog_array_item *item;
>>>> +       struct bpf_prog *prog;
>>>> +       struct bpf_run_ctx *old_run_ctx;
>>>> +       struct bpf_trace_run_ctx run_ctx;
>>>> +       u32 ret = 1;
>>>> +
>>>> +       might_fault();
>>>> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
>>>> +
>>>> +       if (unlikely(!array))
>>>> +               return ret;
>>>> +
>>>> +       migrate_disable();
>>>> +
>>>> +       run_ctx.is_uprobe = false;
>>>> +
>>>> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>>>> +       item = &array->items[0];
>>>> +       while ((prog = READ_ONCE(item->prog))) {
>>>> +               if (!prog->sleepable)
>>>> +                       rcu_read_lock();
>>>> +
>>>> +               /* Per-prog recursion check to enable private stack. */
>>>> +               if (unlikely(!bpf_prog_get_recursion_context(prog))) {
>>>
>>> from sashiko
>>>
>>>> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>>>> + item = &array->items[0];
>>>> + while ((prog = READ_ONCE(item->prog))) {
>>>> + if (!prog->sleepable)
>>>> + rcu_read_lock();
>>>> +
>>>> + /* Per-prog recursion check to enable private stack. */
>>>> + if (unlikely(!bpf_prog_get_recursion_context(prog))) {
>>>
>>> Can this cause a panic by dereferencing dummy_bpf_prog.prog.active?
>>> When a program is detached from a BPF array and memory allocation for the new
>>> array fails, bpf_prog_array_delete_safe() replaces the detached program with
>>> &dummy_bpf_prog.prog as a fallback.
>>> Because dummy_bpf_prog is a statically allocated placeholder, its prog.active
>>> field is uninitialized (NULL).
>>> If prog is dummy_bpf_prog.prog, bpf_prog_get_recursion_context() will
>>> dereference prog->active as a per-CPU pointer, accessing offset 0 in the
>>> per-CPU area and causing memory corruption or a panic.
>>> Should there be an explicit check to skip dummy_bpf_prog.prog here?
>>
>> Looks like a real issue, thanks. I think the best solution is to add
>> valid `prog->active` field for the dummy_bpf_prog.prog so we don't need
>> to maintain special branches and can rely on the being a valid bpf_prog.
> 
> No. Don't copy paste from claude. Ask it to do the home work.
> This was already discussed on the list.
> I mean 'oh it's NULL let's add it'. It was similar issue elsewhere
> and solved differently.

Actually claude suggested to simply check if prog == 
dummy_bpf_prog.prog, similarly to how it's done in kernel/bpf/core.c 
(for example function bpf_prog_array_copy_core()), but to me adding a 
valid `active` field sounds like a more future proof solution.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-13 17:32         ` Mykyta Yatsenko
@ 2026-04-13 20:05           ` Alexei Starovoitov
  0 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2026-04-13 20:05 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin Lau, Kernel Team, Eduard, Kumar Kartikeya Dwivedi,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko

On Mon, Apr 13, 2026 at 10:32 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
>
>
> On 4/13/26 5:25 PM, Alexei Starovoitov wrote:
> > On Mon, Apr 13, 2026 at 5:55 AM Mykyta Yatsenko
> > <mykyta.yatsenko5@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/10/26 11:55 PM, Alexei Starovoitov wrote:
> >>> On Fri, Apr 10, 2026 at 10:09 AM Mykyta Yatsenko
> >>> <mykyta.yatsenko5@gmail.com> wrote:
> >>>>
> >>>> From: Mykyta Yatsenko <yatsenko@meta.com>
> >>>>
> >>>> Add bpf_prog_run_array_sleepable() for running BPF program arrays
> >>>> on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
> >>>> includes per-program recursion checking for private stack safety
> >>>> and hardcodes is_uprobe to false.
> >>>>
> >>>> Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.
> >>>>
> >>>> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> >>>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
> >>>> ---
> >>>>    include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>    1 file changed, 50 insertions(+)
> >>>>
> >>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >>>> index 0136a108d083..4e166accab35 100644
> >>>> --- a/include/linux/bpf.h
> >>>> +++ b/include/linux/bpf.h
> >>>> @@ -3077,6 +3077,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
> >>>>    void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
> >>>>    void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
> >>>>
> >>>> +static __always_inline u32
> >>>> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
> >>>> +                            const void *ctx, bpf_prog_run_fn run_prog)
> >>>> +{
> >>>> +       const struct bpf_prog_array_item *item;
> >>>> +       struct bpf_prog *prog;
> >>>> +       struct bpf_run_ctx *old_run_ctx;
> >>>> +       struct bpf_trace_run_ctx run_ctx;
> >>>> +       u32 ret = 1;
> >>>> +
> >>>> +       might_fault();
> >>>> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> >>>> +
> >>>> +       if (unlikely(!array))
> >>>> +               return ret;
> >>>> +
> >>>> +       migrate_disable();
> >>>> +
> >>>> +       run_ctx.is_uprobe = false;
> >>>> +
> >>>> +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >>>> +       item = &array->items[0];
> >>>> +       while ((prog = READ_ONCE(item->prog))) {
> >>>> +               if (!prog->sleepable)
> >>>> +                       rcu_read_lock();
> >>>> +
> >>>> +               /* Per-prog recursion check to enable private stack. */
> >>>> +               if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> >>>
> >>> from sashiko
> >>>
> >>>> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >>>> + item = &array->items[0];
> >>>> + while ((prog = READ_ONCE(item->prog))) {
> >>>> + if (!prog->sleepable)
> >>>> + rcu_read_lock();
> >>>> +
> >>>> + /* Per-prog recursion check to enable private stack. */
> >>>> + if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> >>>
> >>> Can this cause a panic by dereferencing dummy_bpf_prog.prog.active?
> >>> When a program is detached from a BPF array and memory allocation for the new
> >>> array fails, bpf_prog_array_delete_safe() replaces the detached program with
> >>> &dummy_bpf_prog.prog as a fallback.
> >>> Because dummy_bpf_prog is a statically allocated placeholder, its prog.active
> >>> field is uninitialized (NULL).
> >>> If prog is dummy_bpf_prog.prog, bpf_prog_get_recursion_context() will
> >>> dereference prog->active as a per-CPU pointer, accessing offset 0 in the
> >>> per-CPU area and causing memory corruption or a panic.
> >>> Should there be an explicit check to skip dummy_bpf_prog.prog here?
> >>
> >> Looks like a real issue, thanks. I think the best solution is to add
> >> valid `prog->active` field for the dummy_bpf_prog.prog so we don't need
> >> to maintain special branches and can rely on the being a valid bpf_prog.
> >
> > No. Don't copy paste from claude. Ask it to do the home work.
> > This was already discussed on the list.
> > I mean 'oh it's NULL let's add it'. It was similar issue elsewhere
> > and solved differently.
>
> Actually claude suggested to simply check if prog ==
> dummy_bpf_prog.prog, similarly to how it's done in kernel/bpf/core.c
> (for example function bpf_prog_array_copy_core()), but to me adding a
> valid `active` field sounds like a more future proof solution.

Sigh.
Ask claude to do more homework.
Don't be satisfied with the first answer.
Hint: prog->active is not the only one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-13 20:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 17:09 [PATCH bpf-next v9 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
2026-04-10 17:09 ` [PATCH bpf-next v9 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
2026-04-10 17:09 ` [PATCH bpf-next v9 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
2026-04-10 22:55   ` Alexei Starovoitov
2026-04-13 12:55     ` Mykyta Yatsenko
2026-04-13 16:25       ` Alexei Starovoitov
2026-04-13 17:32         ` Mykyta Yatsenko
2026-04-13 20:05           ` Alexei Starovoitov
2026-04-10 17:09 ` [PATCH bpf-next v9 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
2026-04-10 19:39   ` Alexei Starovoitov
2026-04-10 17:09 ` [PATCH bpf-next v9 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
2026-04-10 17:09 ` [PATCH bpf-next v9 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
2026-04-10 17:09 ` [PATCH bpf-next v9 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox