public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs
@ 2026-04-22 15:27 Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

This series adds support for sleepable BPF programs attached to raw
tracepoints (tp_btf, raw_tp) and classic tracepoints (tp).
The motivation is to allow BPF programs on syscall
tracepoints to use sleepable helpers such as bpf_copy_from_user(),
enabling reliable user memory reads that can page-fault.

This series removes restriction for faultable tracepoints:

Patch 1 modifies __bpf_trace_run() to support sleepable programs.

Patch 2 introduces bpf_prog_run_array_sleepable() to support new usecase.

Patch 3 adds sleepable support for classic tracepoints
(BPF_PROG_TYPE_TRACEPOINT) by introducing trace_call_bpf_faultable()
and restructuring perf_syscall_enter/exit() to run BPF programs in
faultable context.

Patch 4 allows BPF_TRACE_RAW_TP, BPF_PROG_TYPE_RAW_TRACEPOINT, and
BPF_PROG_TYPE_TRACEPOINT programs to be loaded as sleepable, with
load-time and attach-time checks to reject sleepable programs on
non-faultable tracepoints.

Patch 5 adds libbpf SEC_DEF handlers: tp_btf.s, raw_tp.s,
raw_tracepoint.s, tp.s, and tracepoint.s.

Patch 6 adds selftests covering tp_btf.s, raw_tp.s, and tp.s positive
cases using bpf_copy_from_user() plus negative tests for non-faultable
tracepoints.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v12:
- Style improvement in the bpf_prog_run_array_sleepable(): use guard(rcu)(), remove
unnecessary defensive programming artifacts.
- Link to v11: https://patch.msgid.link/20260421-sleepable_tracepoints-v11-0-d8ff138d6f05@meta.com

Changes in v11:
- Avoid running dummy prog in the bpf_prog_run_array_sleepable()
- Migrate selftests from nanosleep() to getcwd() to avoid issues with the
different struct layouts.
- Link to v10: https://patch.msgid.link/20260415-sleepable_tracepoints-v10-0-161f40b33dd7@meta.com

Changes in v10:
- Guard per-prog recursion check in bpf_prog_run_array_sleepable()
  with prog->active NULL check, following the same pattern as
  commit 7dc211c1159d for prog->stats. dummy_bpf_prog has NULL
  active field and can appear in the array via
  bpf_prog_array_delete_safe() fallback on allocation failure.
- Link to v9: https://patch.msgid.link/20260410-sleepable_tracepoints-v9-0-e719e664e84c@meta.com

Changes in v9:
- Fixed "classic raw tracepoints" to "raw tracepoints (tp_btf, raw_tp)"
in commit message
- Added bpf_prog_get_recursion_context() guard to
__bpf_prog_test_run_raw_tp() to protect per-CPU private stack from
concurrent sleepable test runs
- Added new bpf_prog_run_array_sleepable() without is_uprobe parameter,
remove all changes in bpf_prog_run_array_uprobe()
- Refactored attach_tp() to use prefix array uniformly (matching
attach_raw_tp() pattern), removing hardcoded strcmp() bare-name checks.
- Recursion check in __bpf_prog_test_run_raw_tp()
- Refactored selftests
- Link to v8: https://patch.msgid.link/20260330-sleepable_tracepoints-v8-0-2e323467f3a0@meta.com

Changes in v8:
- Fix sleepable tracepoint support in bpf_prog_test_run() (Kumar, sashiko)
- Link to v7: https://patch.msgid.link/20260325-sleepable_tracepoints-v6-0-2b182dacea13@meta.com

Changes in v7:
- Add recursion check (bpf_prog_get_recursion_context()) to make sure
private stack is safe when sleepable program is preempted by itself
(Alexei, Kumar)
- Use combined rcu_read_lock_dont_migrate() instead of separate
rcu_read_lock()/migrate_disable() calls for non-sleepable path (Alexei)
- Link to v6: https://lore.kernel.org/bpf/20260324-sleepable_tracepoints-v6-0-81bab3a43f25@meta.com/

Changes in v6:
- Remove recursion check from trace_call_bpf_faultable(), sleepable
tracepoints are called from syscall enter/exit, no recursion is
possible.(Kumar)
- Refactor bpf_prog_run_array_uprobe() to support tracepoints
usecase cleanly (Kumar)
- Link to v5: https://lore.kernel.org/r/20260316-sleepable_tracepoints-v5-0-85525de71d25@meta.com

Changes in v5:
- Addressed AI review: zero initialize struct pt_regs in
perf_call_bpf_enter(); changed handling tp.s and tracepoint.s in
attach_tp() in libbpf.
- Updated commit messages
- Link to v4: https://lore.kernel.org/r/20260313-sleepable_tracepoints-v4-0-debc688a66b3@meta.com

Changes in v4:
- Follow uprobe_prog_run() pattern with explicit rcu_read_lock_trace()
  instead of relying on outer rcu_tasks_trace lock
- Add sleepable support for classic raw tracepoints (raw_tp.s)
- Add sleepable support for classic tracepoints (tp.s) with new
  trace_call_bpf_faultable() and restructured perf_syscall_enter/exit()
- Add raw_tp.s, raw_tracepoint.s, tp.s, tracepoint.s SEC_DEF handlers
- Replace growing type enumeration in error message with generic
  "program of this type cannot be sleepable"
- Use PT_REGS_PARM1_SYSCALL (non-CO-RE) in BTF test
- Add classic raw_tp and classic tracepoint sleepable tests
- Link to v3: https://lore.kernel.org/r/20260311-sleepable_tracepoints-v3-0-3e9bbde5bd22@meta.com

Changes in v3:
  - Moved faultable tracepoint check from attach time to load time in
    bpf_check_attach_target(), providing a clear verifier error message
  - Folded preempt_disable removal into the sleepable execution path
    patch
  - Used RUN_TESTS() with __failure/__msg for negative test case instead
    of explicit userspace program
  - Reduced series from 6 patches to 4
  - Link to v2: https://lore.kernel.org/r/20260225-sleepable_tracepoints-v2-0-0330dafd650f@meta.com

Changes in v2:
  - Address AI review points - modified the order of the patches
  - Link to v1: https://lore.kernel.org/bpf/20260218-sleepable_tracepoints-v1-0-ec2705497208@meta.com/

---
Mykyta Yatsenko (6):
      bpf: Add sleepable support for raw tracepoint programs
      bpf: Add bpf_prog_run_array_sleepable()
      bpf: Add sleepable support for classic tracepoint programs
      bpf: Verifier support for sleepable tracepoint programs
      libbpf: Add section handlers for sleepable tracepoints
      selftests/bpf: Add tests for sleepable tracepoint programs

 include/linux/bpf.h                                |  50 ++++++++
 include/linux/trace_events.h                       |   6 +
 include/trace/bpf_probe.h                          |   2 -
 kernel/bpf/syscall.c                               |   5 +
 kernel/bpf/verifier.c                              |  13 +-
 kernel/events/core.c                               |   9 ++
 kernel/trace/bpf_trace.c                           |  48 ++++++-
 kernel/trace/trace_syscalls.c                      | 110 ++++++++--------
 net/bpf/test_run.c                                 |  65 +++++++---
 tools/lib/bpf/libbpf.c                             |  88 ++++++++-----
 .../bpf/prog_tests/sleepable_tracepoints.c         | 142 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 112 ++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 14 files changed, 578 insertions(+), 107 deletions(-)
---
base-commit: 05103382104b8ffc701f7b2c79379a0361ecad30
change-id: 20260216-sleepable_tracepoints-381ae1410550

Best regards,
--  
Mykyta Yatsenko <yatsenko@meta.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw tracepoint programs
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  2026-04-22 21:43   ` sashiko-bot
  2026-04-22 15:27 ` [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Rework __bpf_trace_run() to support sleepable BPF programs by using
explicit RCU flavor selection, following the uprobe_prog_run() pattern.

For sleepable programs, use rcu_read_lock_tasks_trace() for lifetime
protection with migrate_disable(). For non-sleepable programs, use the
regular rcu_read_lock_dont_migrate().

Remove the preempt_disable_notrace/preempt_enable_notrace pair from
the faultable tracepoint BPF probe wrapper in bpf_probe.h, since
migration protection and RCU locking are now handled per-program
inside __bpf_trace_run().

Adapt bpf_prog_test_run_raw_tp() for sleepable programs: reject
BPF_F_TEST_RUN_ON_CPU since sleepable programs cannot run in hardirq
or preempt-disabled context, and call __bpf_prog_test_run_raw_tp()
directly instead of via smp_call_function_single(). Rework
__bpf_prog_test_run_raw_tp() to select RCU flavor per-program and
add per-program recursion context guard for private stack safety.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/trace/bpf_probe.h |  2 --
 kernel/trace/bpf_trace.c  | 20 ++++++++++++---
 net/bpf/test_run.c        | 65 ++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 9391d54d3f12..d1de8f9aa07f 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -58,9 +58,7 @@ static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
 	might_fault();							\
-	preempt_disable_notrace();					\
 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
-	preempt_enable_notrace();					\
 }
 
 #undef DECLARE_EVENT_SYSCALL_CLASS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e916f0ccbed9..7276c72c1d31 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2072,11 +2072,19 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 static __always_inline
 void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_prog *prog = link->link.prog;
+	bool sleepable = prog->sleepable;
 	struct bpf_run_ctx *old_run_ctx;
 	struct bpf_trace_run_ctx run_ctx;
 
-	rcu_read_lock_dont_migrate();
+	if (sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock_dont_migrate();
+	}
+
 	if (unlikely(!bpf_prog_get_recursion_context(prog))) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
@@ -2085,12 +2093,18 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 	run_ctx.bpf_cookie = link->cookie;
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	(void) bpf_prog_run(prog, args);
+	(void)bpf_prog_run(prog, args);
 
 	bpf_reset_run_ctx(old_run_ctx);
 out:
 	bpf_prog_put_recursion_context(prog);
-	rcu_read_unlock_migrate();
+
+	if (sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock_migrate();
+	}
 }
 
 #define UNPACK(...)			__VA_ARGS__
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2bc04feadfab..c9aea7052ba7 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -748,14 +748,35 @@ static void
 __bpf_prog_test_run_raw_tp(void *data)
 {
 	struct bpf_raw_tp_test_run_info *info = data;
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_trace_run_ctx run_ctx = {};
 	struct bpf_run_ctx *old_run_ctx;
 
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	rcu_read_lock();
+	if (info->prog->sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock();
+	}
+
+	if (unlikely(!bpf_prog_get_recursion_context(info->prog))) {
+		bpf_prog_inc_misses_counter(info->prog);
+		goto out;
+	}
+
 	info->retval = bpf_prog_run(info->prog, info->ctx);
-	rcu_read_unlock();
+
+out:
+	bpf_prog_put_recursion_context(info->prog);
+
+	if (info->prog->sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock();
+	}
 
 	bpf_reset_run_ctx(old_run_ctx);
 }
@@ -783,6 +804,13 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0)
 		return -EINVAL;
 
+	/*
+	 * Sleepable programs cannot run with preemption disabled or in
+	 * hardirq context (smp_call_function_single), reject the flag.
+	 */
+	if (prog->sleepable && (kattr->test.flags & BPF_F_TEST_RUN_ON_CPU))
+		return -EINVAL;
+
 	if (ctx_size_in) {
 		info.ctx = memdup_user(ctx_in, ctx_size_in);
 		if (IS_ERR(info.ctx))
@@ -791,24 +819,31 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		info.ctx = NULL;
 	}
 
+	info.retval = 0;
 	info.prog = prog;
 
-	current_cpu = get_cpu();
-	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
-	    cpu == current_cpu) {
+	if (prog->sleepable) {
 		__bpf_prog_test_run_raw_tp(&info);
-	} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		/* smp_call_function_single() also checks cpu_online()
-		 * after csd_lock(). However, since cpu is from user
-		 * space, let's do an extra quick check to filter out
-		 * invalid value before smp_call_function_single().
-		 */
-		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		current_cpu = get_cpu();
+		if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
+		    cpu == current_cpu) {
+			__bpf_prog_test_run_raw_tp(&info);
+		} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+			/*
+			 * smp_call_function_single() also checks cpu_online()
+			 * after csd_lock(). However, since cpu is from user
+			 * space, let's do an extra quick check to filter out
+			 * invalid value before smp_call_function_single().
+			 */
+			err = -ENXIO;
+		} else {
+			err = smp_call_function_single(cpu,
+						       __bpf_prog_test_run_raw_tp,
+						       &info, 1);
+		}
+		put_cpu();
 	}
-	put_cpu();
 
 	if (!err &&
 	    copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  2026-04-22 16:06   ` bot+bpf-ci
  2026-04-22 22:02   ` sashiko-bot
  2026-04-22 15:27 ` [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add bpf_prog_run_array_sleepable() for running BPF program arrays
on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
includes per-program recursion checking for private stack safety
and hardcodes is_uprobe to false.

Skip dummy_bpf_prog at the top of the loop. When
bpf_prog_array_delete_safe() replaces a detached program with
dummy_bpf_prog on allocation failure, the dummy is statically
allocated and has NULL active, stats, and aux fields. Identify
it by prog->len == 0, since every real program has at least one
instruction.

Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..4bc39eca1863 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
 void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
 
+static __always_inline u32
+bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
+			     const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	struct bpf_prog *prog;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
+	u32 ret = 1;
+
+	if (unlikely(!array))
+		return ret;
+
+	migrate_disable();
+
+	run_ctx.is_uprobe = false;
+
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	item = &array->items[0];
+	while ((prog = READ_ONCE(item->prog))) {
+		/* Skip dummy_bpf_prog placeholder (len == 0) */
+		if (unlikely(!prog->len)) {
+			item++;
+			continue;
+		}
+
+		if (unlikely(!bpf_prog_get_recursion_context(prog))) {
+			bpf_prog_inc_misses_counter(prog);
+			bpf_prog_put_recursion_context(prog);
+			item++;
+			continue;
+		}
+
+		run_ctx.bpf_cookie = item->bpf_cookie;
+
+		if (prog->sleepable) {
+			guard(rcu)();
+			ret &= run_prog(prog, ctx);
+		} else {
+			ret &= run_prog(prog, ctx);
+		}
+
+		bpf_prog_put_recursion_context(prog);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	migrate_enable();
+	return ret;
+}
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  2026-04-22 23:06   ` sashiko-bot
  2026-04-22 15:27 ` [PATCH bpf-next v12 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
faultable tracepoints that supports sleepable BPF programs. It uses
rcu_tasks_trace for lifetime protection and
bpf_prog_run_array_sleepable() for per-program RCU flavor selection,
following the uprobe_prog_run() pattern.

Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
programs before perf event processing. Previously, BPF ran after the
per-cpu perf trace buffer was allocated under preempt_disable,
requiring cleanup via perf_swevent_put_recursion_context() on filter.
Now BPF runs in faultable context before preempt_disable, reading
syscall arguments from local variables instead of the per-cpu trace
record, removing the dependency on buffer allocation. This allows
sleepable BPF programs to execute and avoids unnecessary buffer
allocation when BPF filters the event. The perf event submission
path (buffer allocation, fill, submit) remains under preempt_disable
as before. Since BPF no longer runs within the buffer allocation
context, the fake_regs output parameter to perf_trace_buf_alloc()
is no longer needed and is replaced with NULL.

Add an attach-time check in __perf_event_set_bpf_prog() to reject
sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
tracepoints, since only syscall tracepoints run in faultable context.

This prepares the classic tracepoint runtime and attach paths for
sleepable programs. The verifier changes to allow loading sleepable
BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.

To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/trace_events.h  |   6 +++
 kernel/events/core.c          |   9 ++++
 kernel/trace/bpf_trace.c      |  28 +++++++++++
 kernel/trace/trace_syscalls.c | 110 ++++++++++++++++++++++--------------------
 4 files changed, 101 insertions(+), 52 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 40a43a4c7caf..d49338c44014 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -770,6 +770,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 
 #ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
@@ -792,6 +793,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
 	return 1;
 }
 
+static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	return 1;
+}
+
 static inline int
 perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
 {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6d1f8bad7e1c..0f9cacfa7cb8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11643,6 +11643,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
 		/* only uprobe programs are allowed to be sleepable */
 		return -EINVAL;
 
+	if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
+		/*
+		 * Sleepable tracepoint programs can only attach to faultable
+		 * tracepoints. Currently only syscall tracepoints are faultable.
+		 */
+		if (!is_syscall_tp)
+			return -EINVAL;
+	}
+
 	/* Kprobe override only works for kprobes, not uprobes. */
 	if (prog->kprobe_override && !is_kprobe)
 		return -EINVAL;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7276c72c1d31..a822c589c9bd 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -152,6 +152,34 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 	return ret;
 }
 
+/**
+ * trace_call_bpf_faultable - invoke BPF program in faultable context
+ * @call: tracepoint event
+ * @ctx: opaque context pointer
+ *
+ * Variant of trace_call_bpf() for faultable tracepoints (syscall
+ * tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
+ * for lifetime protection and bpf_prog_run_array_sleepable() for per-program
+ * RCU flavor selection, following the uprobe pattern.
+ *
+ * Per-program recursion protection is provided by
+ * bpf_prog_run_array_sleepable(). Global bpf_prog_active is not
+ * needed because syscall tracepoints cannot self-recurse.
+ *
+ * Must be called from a faultable/preemptible context.
+ */
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	struct bpf_prog_array *prog_array;
+
+	might_fault();
+	guard(rcu_tasks_trace)();
+
+	prog_array = rcu_dereference_check(call->prog_array,
+					   rcu_read_lock_trace_held());
+	return bpf_prog_run_array_sleepable(prog_array, ctx, bpf_prog_run);
+}
+
 #ifdef CONFIG_BPF_KPROBE_OVERRIDE
 BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
 {
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8ad72e17d8eb..e98ee7e1e66f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -1371,33 +1371,33 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
-static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
+static int perf_call_bpf_enter(struct trace_event_call *call,
 			       struct syscall_metadata *sys_data,
-			       struct syscall_trace_enter *rec)
+			       int syscall_nr, unsigned long *args)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long args[SYSCALL_DEFINE_MAXARGS];
 	} __aligned(8) param;
+	struct pt_regs regs = {};
 	int i;
 
 	BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
 
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
 	for (i = 0; i < sys_data->nb_args; i++)
-		param.args[i] = rec->args[i];
-	return trace_call_bpf(call, &param);
+		param.args[i] = args[i];
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_enter *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	unsigned long args[6];
 	bool valid_prog_array;
@@ -1410,12 +1410,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	int size = 0;
 	int uargs = 0;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1429,6 +1424,26 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 
 	syscall_get_arguments(current, regs, args);
 
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
+	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
+	if (valid_prog_array &&
+	    !perf_call_bpf_enter(sys_data->enter_event, sys_data,
+				 syscall_nr, args))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->enter_event->perf_events);
+	if (hlist_empty(head))
+		return;
+
 	/* Check if this syscall event faults in user space memory */
 	mayfault = sys_data->user_mask != 0;
 
@@ -1438,17 +1453,12 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 			return;
 	}
 
-	head = this_cpu_ptr(sys_data->enter_event->perf_events);
-	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
-	if (!valid_prog_array && hlist_empty(head))
-		return;
-
 	/* get the size after alignment with the u32 buffer size field */
 	size += sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
@@ -1458,13 +1468,6 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	if (mayfault)
 		syscall_put_data(sys_data, rec, user_ptr, size, user_sizes, uargs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx,
 			      sys_data->enter_event->event.type, 1, regs,
 			      head, NULL);
@@ -1514,40 +1517,35 @@ static void perf_sysenter_disable(struct trace_event_call *call)
 		syscall_fault_buffer_disable();
 }
 
-static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
-			      struct syscall_trace_exit *rec)
+static int perf_call_bpf_exit(struct trace_event_call *call,
+			      int syscall_nr, long ret_val)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long ret;
 	} __aligned(8) param;
-
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
-	param.ret = rec->ret;
-	return trace_call_bpf(call, &param);
+	struct pt_regs regs = {};
+
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
+	param.ret = ret_val;
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_exit *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	bool valid_prog_array;
 	int syscall_nr;
 	int rctx;
 	int size;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1559,29 +1557,37 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
 	valid_prog_array = bpf_prog_array_valid(sys_data->exit_event);
-	if (!valid_prog_array && hlist_empty(head))
+	if (valid_prog_array &&
+	    !perf_call_bpf_exit(sys_data->exit_event, syscall_nr,
+				syscall_get_return_value(current, regs)))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	if (hlist_empty(head))
 		return;
 
 	/* We can probably do that at build time */
 	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_exit(sys_data->exit_event, fake_regs, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx, sys_data->exit_event->event.type,
 			      1, regs, head, NULL);
 }

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 4/6] bpf: Verifier support for sleepable tracepoint programs
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (2 preceding siblings ...)
  2026-04-22 15:27 ` [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  5 siblings, 0 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Allow BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_TRACEPOINT, and
BPF_TRACE_RAW_TP (tp_btf) programs to be sleepable by adding them
to can_be_sleepable().

For BTF-based raw tracepoints (tp_btf), add a load-time check in
bpf_check_attach_target() that rejects sleepable programs attaching
to non-faultable tracepoints with a descriptive error message.

For raw tracepoints (raw_tp), add an attach-time check in
bpf_raw_tp_link_attach() that rejects sleepable programs on
non-faultable tracepoints. The attach-time check is needed because
the tracepoint name is not known at load time for raw_tp.

The attach-time check for classic tracepoints (tp) in
__perf_event_set_bpf_prog() was added in the previous patch.

Replace the verbose error message that enumerates allowed program
types with a generic "Program of this type cannot be sleepable"
message, since the list of sleepable-capable types keeps growing.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/syscall.c  |  5 +++++
 kernel/bpf/verifier.c | 13 +++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a3c0214ca934..3b1f0ba02f61 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4281,6 +4281,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 	if (!btp)
 		return -ENOENT;
 
+	if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+		bpf_put_raw_tracepoint(btp);
+		return -EINVAL;
+	}
+
 	link = kzalloc_obj(*link, GFP_USER);
 	if (!link) {
 		err = -ENOMEM;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 185210b73385..5b4806fdb648 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19267,6 +19267,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		btp = bpf_get_raw_tracepoint(tname);
 		if (!btp)
 			return -EINVAL;
+		if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+			bpf_log(log, "Sleepable program cannot attach to non-faultable tracepoint %s\n",
+				tname);
+			bpf_put_raw_tracepoint(btp);
+			return -EINVAL;
+		}
 		fname = kallsyms_lookup((unsigned long)btp->bpf_func, NULL, NULL, NULL,
 					trace_symbol);
 		bpf_put_raw_tracepoint(btp);
@@ -19483,6 +19489,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 		case BPF_MODIFY_RETURN:
 		case BPF_TRACE_ITER:
 		case BPF_TRACE_FSESSION:
+		case BPF_TRACE_RAW_TP:
 			return true;
 		default:
 			return false;
@@ -19490,7 +19497,9 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 	}
 	return prog->type == BPF_PROG_TYPE_LSM ||
 	       prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ ||
-	       prog->type == BPF_PROG_TYPE_STRUCT_OPS;
+	       prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
+	       prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT ||
+	       prog->type == BPF_PROG_TYPE_TRACEPOINT;
 }
 
 static int check_attach_btf_id(struct bpf_verifier_env *env)
@@ -19512,7 +19521,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	}
 
 	if (prog->sleepable && !can_be_sleepable(prog)) {
-		verbose(env, "Only fentry/fexit/fsession/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n");
+		verbose(env, "Program of this type cannot be sleepable\n");
 		return -EINVAL;
 	}
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 5/6] libbpf: Add section handlers for sleepable tracepoints
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (3 preceding siblings ...)
  2026-04-22 15:27 ` [PATCH bpf-next v12 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  2026-04-22 15:27 ` [PATCH bpf-next v12 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  5 siblings, 0 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add SEC_DEF entries for sleepable tracepoint variants:
  - "tp_btf.s+"     for sleepable BTF-based raw tracepoints
  - "raw_tp.s+"     for sleepable raw tracepoints
  - "raw_tracepoint.s+" (alias)
  - "tp.s+"         for sleepable classic tracepoints
  - "tracepoint.s+" (alias)

Extract sec_name_match_prefix() to share the prefix matching logic
between attach_tp() and attach_raw_tp(), eliminating duplicated
loops and hardcoded strcmp() checks for bare section names.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/lib/bpf/libbpf.c | 88 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 31 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 83aae7a39d36..ab2071fdd3e8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10018,11 +10018,16 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("netkit/peer",		SCHED_CLS, BPF_NETKIT_PEER, SEC_NONE),
 	SEC_DEF("tracepoint+",		TRACEPOINT, 0, SEC_NONE, attach_tp),
 	SEC_DEF("tp+",			TRACEPOINT, 0, SEC_NONE, attach_tp),
+	SEC_DEF("tracepoint.s+",	TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
+	SEC_DEF("tp.s+",		TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
 	SEC_DEF("raw_tracepoint+",	RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp+",		RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
+	SEC_DEF("raw_tracepoint.s+",	RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
+	SEC_DEF("raw_tp.s+",		RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
 	SEC_DEF("raw_tracepoint.w+",	RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp.w+",		RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("tp_btf+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF, attach_trace),
+	SEC_DEF("tp_btf.s+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fentry+",		TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fmod_ret+",		TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fexit+",		TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF, attach_trace),
@@ -13152,25 +13157,61 @@ struct bpf_link *bpf_program__attach_tracepoint(const struct bpf_program *prog,
 	return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL);
 }
 
+/*
+ * Match section name against a prefix array. Returns pointer past
+ * "prefix/" on match, empty string for bare sections (exact prefix
+ * match), or NULL if no prefix matches.
+ */
+static const char *sec_name_match_prefix(const char *sec_name,
+					 const char *const *prefixes,
+					 size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		size_t pfx_len;
+
+		if (!str_has_pfx(sec_name, prefixes[i]))
+			continue;
+
+		pfx_len = strlen(prefixes[i]);
+		if (sec_name[pfx_len] == '\0')
+			return sec_name + pfx_len;
+
+		if (sec_name[pfx_len] != '/' || sec_name[pfx_len + 1] == '\0')
+			continue;
+
+		return sec_name + pfx_len + 1;
+	}
+	return NULL;
+}
+
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link)
 {
+	static const char *const prefixes[] = {
+		"tp.s",
+		"tp",
+		"tracepoint.s",
+		"tracepoint",
+	};
 	char *sec_name, *tp_cat, *tp_name;
+	const char *match;
 
 	*link = NULL;
 
-	/* no auto-attach for SEC("tp") or SEC("tracepoint") */
-	if (strcmp(prog->sec_name, "tp") == 0 || strcmp(prog->sec_name, "tracepoint") == 0)
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
+		return -EINVAL;
+	}
+	if (!match[0]) /* bare section name no autoattach */
 		return 0;
 
 	sec_name = strdup(prog->sec_name);
 	if (!sec_name)
 		return -ENOMEM;
 
-	/* extract "tp/<category>/<name>" or "tracepoint/<category>/<name>" */
-	if (str_has_pfx(prog->sec_name, "tp/"))
-		tp_cat = sec_name + sizeof("tp/") - 1;
-	else
-		tp_cat = sec_name + sizeof("tracepoint/") - 1;
+	tp_cat = sec_name + (match - prog->sec_name);
 	tp_name = strchr(tp_cat, '/');
 	if (!tp_name) {
 		free(sec_name);
@@ -13234,37 +13275,22 @@ static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf
 		"raw_tracepoint",
 		"raw_tp.w",
 		"raw_tracepoint.w",
+		"raw_tp.s",
+		"raw_tracepoint.s",
 	};
-	size_t i;
-	const char *tp_name = NULL;
+	const char *match;
 
 	*link = NULL;
 
-	for (i = 0; i < ARRAY_SIZE(prefixes); i++) {
-		size_t pfx_len;
-
-		if (!str_has_pfx(prog->sec_name, prefixes[i]))
-			continue;
-
-		pfx_len = strlen(prefixes[i]);
-		/* no auto-attach case of, e.g., SEC("raw_tp") */
-		if (prog->sec_name[pfx_len] == '\0')
-			return 0;
-
-		if (prog->sec_name[pfx_len] != '/')
-			continue;
-
-		tp_name = prog->sec_name + pfx_len + 1;
-		break;
-	}
-
-	if (!tp_name) {
-		pr_warn("prog '%s': invalid section name '%s'\n",
-			prog->name, prog->sec_name);
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
 		return -EINVAL;
 	}
+	if (!match[0])
+		return 0;
 
-	*link = bpf_program__attach_raw_tracepoint(prog, tp_name);
+	*link = bpf_program__attach_raw_tracepoint(prog, match);
 	return libbpf_get_error(*link);
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v12 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (4 preceding siblings ...)
  2026-04-22 15:27 ` [PATCH bpf-next v12 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
@ 2026-04-22 15:27 ` Mykyta Yatsenko
  5 siblings, 0 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 15:27 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Cover all three sleepable tracepoint types (tp_btf.s, raw_tp.s, tp.s)
and sys_exit (via bpf_task_pt_regs) with functional tests using
bpf_copy_from_user() on getcwd. Verify alias and bare SEC variants,
bpf_prog_test_run_raw_tp() with BPF_F_TEST_RUN_ON_CPU rejection,
attach-time rejection on non-faultable tracepoints, and load-time
rejection for sleepable tp_btf on non-faultable tracepoints.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 .../bpf/prog_tests/sleepable_tracepoints.c         | 142 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 112 ++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 4 files changed, 287 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
new file mode 100644
index 000000000000..19500b785ee3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <unistd.h>
+#include "test_sleepable_tracepoints.skel.h"
+#include "test_sleepable_tracepoints_fail.skel.h"
+
+static void run_test(struct test_sleepable_tracepoints *skel)
+{
+	char buf[PATH_MAX] = "/";
+
+	skel->bss->target_pid = getpid();
+	skel->bss->prog_triggered = 0;
+	skel->bss->err = 0;
+	skel->bss->copied_byte = 0;
+
+	syscall(__NR_getcwd, buf, sizeof(buf));
+
+	ASSERT_EQ(skel->bss->prog_triggered, 1, "prog_triggered");
+	ASSERT_EQ(skel->bss->err, 0, "err");
+	ASSERT_EQ(skel->bss->copied_byte, '/', "copied_byte");
+}
+
+static void run_auto_attach_test(struct bpf_program *prog,
+				 struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		return;
+
+	run_test(skel);
+	bpf_link__destroy(link);
+}
+
+static void test_attach_only(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_attach_reject(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_ERR_PTR(link, "attach_should_fail"))
+		bpf_link__destroy(link);
+}
+
+static void test_raw_tp_bare(struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach_raw_tracepoint(skel->progs.handle_raw_tp_bare,
+						  "sys_enter");
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_tp_bare(struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach_tracepoint(skel->progs.handle_tp_bare,
+					      "syscalls", "sys_enter_getcwd");
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_test_run(struct test_sleepable_tracepoints *skel)
+{
+	__u64 args[2] = {0x1234ULL, 0x5678ULL};
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.ctx_in = args,
+		.ctx_size_in = sizeof(args),
+	);
+	int fd, err;
+
+	fd = bpf_program__fd(skel->progs.handle_test_run);
+	err = bpf_prog_test_run_opts(fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_EQ(topts.retval, args[0] + args[1], "test_run_retval");
+}
+
+static void test_test_run_on_cpu_reject(struct test_sleepable_tracepoints *skel)
+{
+	__u64 args[2] = {};
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.ctx_in = args,
+		.ctx_size_in = sizeof(args),
+		.flags = BPF_F_TEST_RUN_ON_CPU,
+	);
+	int fd, err;
+
+	fd = bpf_program__fd(skel->progs.handle_test_run);
+	err = bpf_prog_test_run_opts(fd, &topts);
+	ASSERT_ERR(err, "test_run_on_cpu_reject");
+}
+
+void test_sleepable_tracepoints(void)
+{
+	struct test_sleepable_tracepoints *skel;
+
+	skel = test_sleepable_tracepoints__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	if (test__start_subtest("tp_btf"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_tp_btf, skel);
+	if (test__start_subtest("raw_tp"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_raw_tp, skel);
+	if (test__start_subtest("tracepoint"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_tp, skel);
+	if (test__start_subtest("sys_exit"))
+		run_auto_attach_test(skel->progs.handle_sys_exit_tp, skel);
+	if (test__start_subtest("tracepoint_alias"))
+		test_attach_only(skel->progs.handle_sys_enter_tp_alias);
+	if (test__start_subtest("raw_tracepoint_alias"))
+		test_attach_only(skel->progs.handle_sys_enter_raw_tp_alias);
+	if (test__start_subtest("raw_tp_bare"))
+		test_raw_tp_bare(skel);
+	if (test__start_subtest("tp_bare"))
+		test_tp_bare(skel);
+	if (test__start_subtest("test_run"))
+		test_test_run(skel);
+	if (test__start_subtest("test_run_on_cpu_reject"))
+		test_test_run_on_cpu_reject(skel);
+	if (test__start_subtest("raw_tp_non_faultable"))
+		test_attach_reject(skel->progs.handle_raw_tp_non_faultable);
+	if (test__start_subtest("tp_non_syscall"))
+		test_attach_reject(skel->progs.handle_tp_non_syscall);
+	if (test__start_subtest("tp_btf_non_faultable_reject"))
+		RUN_TESTS(test_sleepable_tracepoints_fail);
+
+	test_sleepable_tracepoints__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
new file mode 100644
index 000000000000..254f7fd895d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <asm/unistd.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+int target_pid;
+int prog_triggered;
+long err;
+char copied_byte;
+
+static int copy_getcwd_arg(char *ubuf)
+{
+	err = bpf_copy_from_user(&copied_byte, sizeof(copied_byte), ubuf);
+	if (err)
+		return err;
+
+	prog_triggered = 1;
+	return 0;
+}
+
+SEC("tp_btf.s/sys_enter")
+int BPF_PROG(handle_sys_enter_tp_btf, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_getcwd)
+		return 0;
+
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_SYSCALL(regs));
+}
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_getcwd)
+		return 0;
+
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+SEC("tp.s/syscalls/sys_enter_getcwd")
+int handle_sys_enter_tp(struct syscall_trace_enter *args)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	return copy_getcwd_arg((void *)args->args[0]);
+}
+
+SEC("tp.s/syscalls/sys_exit_getcwd")
+int handle_sys_exit_tp(struct syscall_trace_exit *args)
+{
+	struct pt_regs *regs;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	regs = (struct pt_regs *)bpf_task_pt_regs(bpf_get_current_task_btf());
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+SEC("raw_tp.s")
+int BPF_PROG(handle_raw_tp_bare, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+SEC("tp.s")
+int handle_tp_bare(void *ctx)
+{
+	return 0;
+}
+
+SEC("tracepoint.s/syscalls/sys_enter_getcwd")
+int handle_sys_enter_tp_alias(struct syscall_trace_enter *args)
+{
+	return 0;
+}
+
+SEC("raw_tracepoint.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp_alias, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_test_run, struct pt_regs *regs, long id)
+{
+	if ((__u64)regs == 0x1234ULL && (__u64)id == 0x5678ULL)
+		return (__u64)regs + (__u64)id;
+
+	return 0;
+}
+
+SEC("raw_tp.s/sched_switch")
+int BPF_PROG(handle_raw_tp_non_faultable, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
+
+SEC("tp.s/sched/sched_switch")
+int handle_tp_non_syscall(void *ctx)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
new file mode 100644
index 000000000000..1a0748a9520b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Sleepable program on a non-faultable tracepoint should fail to load */
+SEC("tp_btf.s/sched_switch")
+__failure __msg("Sleepable program cannot attach to non-faultable tracepoint")
+int BPF_PROG(handle_sched_switch, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/sleepable.c b/tools/testing/selftests/bpf/verifier/sleepable.c
index c2b7f5ebf168..6dabc5522945 100644
--- a/tools/testing/selftests/bpf/verifier/sleepable.c
+++ b/tools/testing/selftests/bpf/verifier/sleepable.c
@@ -76,7 +76,20 @@
 	.runs = -1,
 },
 {
-	"sleepable raw tracepoint reject",
+	"sleepable raw tracepoint accept",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACING,
+	.expected_attach_type = BPF_TRACE_RAW_TP,
+	.kfunc = "sys_enter",
+	.result = ACCEPT,
+	.flags = BPF_F_SLEEPABLE,
+	.runs = -1,
+},
+{
+	"sleepable raw tracepoint reject non-faultable",
 	.insns = {
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
@@ -85,7 +98,7 @@
 	.expected_attach_type = BPF_TRACE_RAW_TP,
 	.kfunc = "sched_switch",
 	.result = REJECT,
-	.errstr = "Only fentry/fexit/fsession/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable",
+	.errstr = "Sleepable program cannot attach to non-faultable tracepoint",
 	.flags = BPF_F_SLEEPABLE,
 	.runs = -1,
 },

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 15:27 ` [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
@ 2026-04-22 16:06   ` bot+bpf-ci
  2026-04-22 16:36     ` Mykyta Yatsenko
  2026-04-22 22:02   ` sashiko-bot
  1 sibling, 1 reply; 16+ messages in thread
From: bot+bpf-ci @ 2026-04-22 16:06 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, peterz, rostedt
  Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
	clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3735 bytes --]

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>  void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>
> +static __always_inline u32
> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
> +			     const void *ctx, bpf_prog_run_fn run_prog)
> +{
> +	const struct bpf_prog_array_item *item;
> +	struct bpf_prog *prog;
> +	struct bpf_run_ctx *old_run_ctx;
> +	struct bpf_trace_run_ctx run_ctx;
> +	u32 ret = 1;
> +
> +	if (unlikely(!array))
> +		return ret;
> +
> +	migrate_disable();
> +
> +	run_ctx.is_uprobe = false;
> +
> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> +	item = &array->items[0];
> +	while ((prog = READ_ONCE(item->prog))) {
> +		/* Skip dummy_bpf_prog placeholder (len == 0) */
> +		if (unlikely(!prog->len)) {
> +			item++;
> +			continue;
> +		}
> +
> +		if (unlikely(!bpf_prog_get_recursion_context(prog))) {
> +			bpf_prog_inc_misses_counter(prog);
> +			bpf_prog_put_recursion_context(prog);
> +			item++;
> +			continue;
> +		}
> +
> +		run_ctx.bpf_cookie = item->bpf_cookie;
> +
> +		if (prog->sleepable) {
> +			guard(rcu)();
> +			ret &= run_prog(prog, ctx);
> +		} else {
> +			ret &= run_prog(prog, ctx);
> +		}

The sleepable/non-sleepable branches appear inverted relative to
bpf_prog_run_array_uprobe(). In the uprobe sibling at
include/linux/bpf.h:2491-2526, classic RCU is acquired only for
non-sleepable programs:

  while ((prog = READ_ONCE(item->prog))) {
      if (!prog->sleepable)
          rcu_read_lock();
      run_ctx.bpf_cookie = item->bpf_cookie;
      ret &= run_prog(prog, ctx);
      item++;
      if (!prog->sleepable)
          rcu_read_unlock();
  }

Here, the condition takes classic RCU (guard(rcu)() expands to
rcu_read_lock()/rcu_read_unlock() via DEFINE_LOCK_GUARD_0 in
include/linux/rcupdate.h:1181) around sleepable programs while taking no
RCU read-side lock for non-sleepable ones.

Sleepable BPF programs are designed to fault and sleep. Executing them
inside a classic RCU read-side critical section violates the RCU
requirement that classic readers must not block. On CONFIG_PREEMPT_RCU=n
kernels this can cause RCU stalls.

Meanwhile, non-sleepable BPF programs often rely on classic-RCU-protected
structures (like classic RCU-protected map entries). Invoking them with no
classic RCU read-side critical section weakens protection. The caller added
in the subsequent commit 864dee5d55d1 holds only guard(rcu_tasks_trace)(),
so no outer classic RCU protects these accesses.

Should the condition be inverted to match the uprobe pattern?

This was raised by Alexei Starovoitov in v11 review:
https://lore.kernel.org/bpf/DHZ4I3RWVVCU.16XRZX1HPFI8K@gmail.com/

Alexei suggested in his April 21st comment:

  "Why grab rcu_read_lock() and undo it?
  imo it would be cleaner and faster to do
  bpf_prog_get_recursion_context() here ...

  ... and then here:
  if (!prog->sleepable) {
    guard(rcu)();
    ret &= run_prog(prog, ctx);
  } else {
    ret &= run_prog(prog, ctx);
  }"

Was this feedback addressed in v12?

> +
> +		bpf_prog_put_recursion_context(prog);
> +		item++;
> +	}
> +	bpf_reset_run_ctx(old_run_ctx);
> +	migrate_enable();
> +	return ret;
> +}
> +
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>  {


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24787752194

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 16:06   ` bot+bpf-ci
@ 2026-04-22 16:36     ` Mykyta Yatsenko
  2026-04-22 17:00       ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 16:36 UTC (permalink / raw)
  To: bot+bpf-ci, bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87,
	memxor, peterz, rostedt
  Cc: yatsenko, martin.lau, yonghong.song, clm, ihor.solodrai



On 4/22/26 5:06 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>>   void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>>   void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>>
>> +static __always_inline u32
>> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
>> +			     const void *ctx, bpf_prog_run_fn run_prog)
>> +{
>> +	const struct bpf_prog_array_item *item;
>> +	struct bpf_prog *prog;
>> +	struct bpf_run_ctx *old_run_ctx;
>> +	struct bpf_trace_run_ctx run_ctx;
>> +	u32 ret = 1;
>> +
>> +	if (unlikely(!array))
>> +		return ret;
>> +
>> +	migrate_disable();
>> +
>> +	run_ctx.is_uprobe = false;
>> +
>> +	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>> +	item = &array->items[0];
>> +	while ((prog = READ_ONCE(item->prog))) {
>> +		/* Skip dummy_bpf_prog placeholder (len == 0) */
>> +		if (unlikely(!prog->len)) {
>> +			item++;
>> +			continue;
>> +		}
>> +
>> +		if (unlikely(!bpf_prog_get_recursion_context(prog))) {
>> +			bpf_prog_inc_misses_counter(prog);
>> +			bpf_prog_put_recursion_context(prog);
>> +			item++;
>> +			continue;
>> +		}
>> +
>> +		run_ctx.bpf_cookie = item->bpf_cookie;
>> +
>> +		if (prog->sleepable) {
>> +			guard(rcu)();
>> +			ret &= run_prog(prog, ctx);
>> +		} else {
>> +			ret &= run_prog(prog, ctx);
>> +		}
> 

Yep, this one should be inverted.
I'll send v13, if Alexei does not mind.

> The sleepable/non-sleepable branches appear inverted relative to
> bpf_prog_run_array_uprobe(). In the uprobe sibling at
> include/linux/bpf.h:2491-2526, classic RCU is acquired only for
> non-sleepable programs:
> 
>    while ((prog = READ_ONCE(item->prog))) {
>        if (!prog->sleepable)
>            rcu_read_lock();
>        run_ctx.bpf_cookie = item->bpf_cookie;
>        ret &= run_prog(prog, ctx);
>        item++;
>        if (!prog->sleepable)
>            rcu_read_unlock();
>    }
> 
> Here, the condition takes classic RCU (guard(rcu)() expands to
> rcu_read_lock()/rcu_read_unlock() via DEFINE_LOCK_GUARD_0 in
> include/linux/rcupdate.h:1181) around sleepable programs while taking no
> RCU read-side lock for non-sleepable ones.
> 
> Sleepable BPF programs are designed to fault and sleep. Executing them
> inside a classic RCU read-side critical section violates the RCU
> requirement that classic readers must not block. On CONFIG_PREEMPT_RCU=n
> kernels this can cause RCU stalls.
> 
> Meanwhile, non-sleepable BPF programs often rely on classic-RCU-protected
> structures (like classic RCU-protected map entries). Invoking them with no
> classic RCU read-side critical section weakens protection. The caller added
> in the subsequent commit 864dee5d55d1 holds only guard(rcu_tasks_trace)(),
> so no outer classic RCU protects these accesses.
> 
> Should the condition be inverted to match the uprobe pattern?
> 
> This was raised by Alexei Starovoitov in v11 review:
> https://lore.kernel.org/bpf/DHZ4I3RWVVCU.16XRZX1HPFI8K@gmail.com/
> 
> Alexei suggested in his April 21st comment:
> 
>    "Why grab rcu_read_lock() and undo it?
>    imo it would be cleaner and faster to do
>    bpf_prog_get_recursion_context() here ...
> 
>    ... and then here:
>    if (!prog->sleepable) {
>      guard(rcu)();
>      ret &= run_prog(prog, ctx);
>    } else {
>      ret &= run_prog(prog, ctx);
>    }"
> 
> Was this feedback addressed in v12?
> 
>> +
>> +		bpf_prog_put_recursion_context(prog);
>> +		item++;
>> +	}
>> +	bpf_reset_run_ctx(old_run_ctx);
>> +	migrate_enable();
>> +	return ret;
>> +}
>> +
>>   #else /* !CONFIG_BPF_SYSCALL */
>>   static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>>   {
> 
> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24787752194


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 16:36     ` Mykyta Yatsenko
@ 2026-04-22 17:00       ` Alexei Starovoitov
  2026-04-22 17:57         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2026-04-22 17:00 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bot+bpf-ci, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin Lau, Kernel Team, Eduard,
	Kumar Kartikeya Dwivedi, Peter Zijlstra, Steven Rostedt,
	Mykyta Yatsenko, Martin KaFai Lau, Yonghong Song, Chris Mason,
	Ihor Solodrai

On Wed, Apr 22, 2026 at 9:36 AM Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
> >> +
> >> +            if (prog->sleepable) {
> >> +                    guard(rcu)();
> >> +                    ret &= run_prog(prog, ctx);
> >> +            } else {
> >> +                    ret &= run_prog(prog, ctx);
> >> +            }
> >
>
> Yep, this one should be inverted.

Ohh and CI was green. Looks like there is a gap in test coverage.
I thought you added a test that does something like bpf_copy_from_user.
We should have seen the splat with config_debug_atomic_sleep.
What happened?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 17:00       ` Alexei Starovoitov
@ 2026-04-22 17:57         ` Kumar Kartikeya Dwivedi
  2026-04-22 18:02           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 16+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-04-22 17:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mykyta Yatsenko, bot+bpf-ci, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin Lau, Kernel Team, Eduard,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko, Martin KaFai Lau,
	Yonghong Song, Chris Mason, Ihor Solodrai

On Wed, 22 Apr 2026 at 19:00, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Apr 22, 2026 at 9:36 AM Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
> >
> > >> +
> > >> +            if (prog->sleepable) {
> > >> +                    guard(rcu)();
> > >> +                    ret &= run_prog(prog, ctx);
> > >> +            } else {
> > >> +                    ret &= run_prog(prog, ctx);
> > >> +            }
> > >
> >
> > Yep, this one should be inverted.
>
> Ohh and CI was green. Looks like there is a gap in test coverage.
> I thought you added a test that does something like bpf_copy_from_user.
> We should have seen the splat with config_debug_atomic_sleep.
> What happened?

CI will likely be green even with splat, since it may not be
registered as test failure?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 17:57         ` Kumar Kartikeya Dwivedi
@ 2026-04-22 18:02           ` Kumar Kartikeya Dwivedi
  2026-04-22 18:27             ` Mykyta Yatsenko
  0 siblings, 1 reply; 16+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-04-22 18:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Ihor Solodrai
  Cc: Mykyta Yatsenko, bot+bpf-ci, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin Lau, Kernel Team, Eduard,
	Peter Zijlstra, Steven Rostedt, Mykyta Yatsenko, Martin KaFai Lau,
	Yonghong Song, Chris Mason

On Wed, 22 Apr 2026 at 19:57, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, 22 Apr 2026 at 19:00, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Apr 22, 2026 at 9:36 AM Mykyta Yatsenko
> > <mykyta.yatsenko5@gmail.com> wrote:
> > >
> > > >> +
> > > >> +            if (prog->sleepable) {
> > > >> +                    guard(rcu)();
> > > >> +                    ret &= run_prog(prog, ctx);
> > > >> +            } else {
> > > >> +                    ret &= run_prog(prog, ctx);
> > > >> +            }
> > > >
> > >
> > > Yep, this one should be inverted.
> >
> > Ohh and CI was green. Looks like there is a gap in test coverage.
> > I thought you added a test that does something like bpf_copy_from_user.
> > We should have seen the splat with config_debug_atomic_sleep.
> > What happened?
>
> CI will likely be green even with splat, since it may not be
> registered as test failure?

I can see the warning here:
https://github.com/kernel-patches/bpf/actions/runs/24787752184/job/72539372270#step:7:8176
We likely need to add some post processing step on dmesg that errors
out on new BUG/WARNING lines in dmesg.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 18:02           ` Kumar Kartikeya Dwivedi
@ 2026-04-22 18:27             ` Mykyta Yatsenko
  0 siblings, 0 replies; 16+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 18:27 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Alexei Starovoitov, Ihor Solodrai
  Cc: bot+bpf-ci, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin Lau, Kernel Team, Eduard, Peter Zijlstra,
	Steven Rostedt, Mykyta Yatsenko, Martin KaFai Lau, Yonghong Song,
	Chris Mason

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Wed, 22 Apr 2026 at 19:57, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>
>> On Wed, 22 Apr 2026 at 19:00, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Wed, Apr 22, 2026 at 9:36 AM Mykyta Yatsenko
>> > <mykyta.yatsenko5@gmail.com> wrote:
>> > >
>> > > >> +
>> > > >> +            if (prog->sleepable) {
>> > > >> +                    guard(rcu)();
>> > > >> +                    ret &= run_prog(prog, ctx);
>> > > >> +            } else {
>> > > >> +                    ret &= run_prog(prog, ctx);
>> > > >> +            }
>> > > >
>> > >
>> > > Yep, this one should be inverted.
>> >
>> > Ohh and CI was green. Looks like there is a gap in test coverage.
>> > I thought you added a test that does something like bpf_copy_from_user.
>> > We should have seen the splat with config_debug_atomic_sleep.
>> > What happened?
>>
>> CI will likely be green even with splat, since it may not be
>> registered as test failure?
>
> I can see the warning here:
> https://github.com/kernel-patches/bpf/actions/runs/24787752184/job/72539372270#step:7:8176
> We likely need to add some post processing step on dmesg that errors
> out on new BUG/WARNING lines in dmesg.

I remember some discussions about adding check for splats.

One of the tests produced this:
2026-04-22T15:58:06.3608335Z [  339.347712] BUG: sleeping function called from invalid context at include/linux/uaccess.h:169
2026-04-22T15:58:06.3610984Z [  339.348335] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 125, name: new_name
2026-04-22T15:58:06.3613195Z [  339.348895] preempt_count: 0, expected: 0
2026-04-22T15:58:06.3616372Z [  339.349178] RCU nest depth: 1, expected: 0
2026-04-22T15:58:06.3618726Z [  339.349472] INFO: lockdep is turned off.
2026-04-22T15:58:06.3631799Z [  339.349760] CPU: 3 UID: 0 PID: 125 Comm: new_name Tainted: G           OE K     7.0.0-g633419d54537 #1 PREEMPT(full)
2026-04-22T15:58:06.3634411Z [  339.349766] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
2026-04-22T15:58:06.3642066Z [  339.349768] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
2026-04-22T15:58:06.3643327Z [  339.349771] Call Trace:
2026-04-22T15:58:06.3644813Z [  339.349774]  <TASK>
2026-04-22T15:58:06.3647734Z [  339.349776]  dump_stack_lvl+0x5d/0x80
2026-04-22T15:58:06.3650841Z [  339.349788]  __might_resched.cold+0x1a7/0x21f
2026-04-22T15:58:06.3653880Z [  339.349793]  ? __pfx___might_resched+0x10/0x10
2026-04-22T15:58:06.3657586Z [  339.349797]  ? __pfx___mutex_unlock_slowpath+0x10/0x10
2026-04-22T15:58:06.3661044Z [  339.349805]  ? perf_event_attach_bpf_prog+0x14b/0x3e0
2026-04-22T15:58:06.3663190Z [  339.349811]  __might_fault+0x66/0x140
2026-04-22T15:58:06.3666611Z [  339.349818]  ? __pfx___might_resched+0x10/0x10
2026-04-22T15:58:06.3669065Z [  339.349821]  _copy_from_user+0x26/0xa0
2026-04-22T15:58:06.3672032Z [  339.349826]  bpf_copy_from_user+0x29/0x60
2026-04-22T15:58:06.3676593Z [  339.349834]  bpf_prog_6d749d25945227f7_handle_sys_enter_tp+0x5d/0x88
2026-04-22T15:58:06.3679583Z [  339.349837]  trace_call_bpf_faultable+0x460/0xa60
2026-04-22T15:58:06.3682800Z [  339.349841]  ? perf_event_ctx_lock_nested+0x193/0x360
2026-04-22T15:58:06.3686442Z [  339.349847]  ? __pfx_trace_call_bpf_faultable+0x10/0x10
2026-04-22T15:58:06.3689643Z [  339.349849]  ? __pfx__perf_ioctl+0x10/0x10
2026-04-22T15:58:06.3692583Z [  339.349856]  perf_call_bpf_enter+0x167/0x2b0
2026-04-22T15:58:06.3695171Z [  339.349862]  ? lock_release+0x256/0x2f0
2026-04-22T15:58:06.3698222Z [  339.349868]  ? __pfx_perf_call_bpf_enter+0x10/0x10
2026-04-22T15:58:06.3701676Z [  339.349874]  ? perf_syscall_enter+0x2a9/0x7d0
2026-04-22T15:58:06.3704200Z [  339.349877]  ? __fget_files+0x1b4/0x2f0
2026-04-22T15:58:06.3706543Z [  339.349883]  ? put_ctx+0x20/0x180
2026-04-22T15:58:06.3709243Z [  339.349885]  ? lock_acquire+0x2b6/0x2f0
2026-04-22T15:58:06.3711654Z [  339.349890]  ? lock_release+0x256/0x2f0
2026-04-22T15:58:06.3714661Z [  339.349894]  ? __might_fault+0xc5/0x140
2026-04-22T15:58:06.3717508Z [  339.349897]  perf_syscall_enter+0x2a9/0x7d0
2026-04-22T15:58:06.3720433Z [  339.349902]  ? __pfx_perf_syscall_enter+0x10/0x10
2026-04-22T15:58:06.3723889Z [  339.349907]  ? __pfx___might_resched+0x10/0x10
2026-04-22T15:58:06.3726935Z [  339.349909]  ? __pfx_fput_close_sync+0x10/0x10
2026-04-22T15:58:06.3729429Z [  339.349916]  trace_syscall_enter+0xb5/0x110
2026-04-22T15:58:06.3732027Z [  339.349921]  do_syscall_64+0x524/0x5f0
2026-04-22T15:58:06.3735196Z [  339.349927]  ? __irq_exit_rcu+0x40/0x1e0
2026-04-22T15:58:06.3738744Z [  339.349933]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
2026-04-22T15:58:06.3740690Z [  339.349937] RIP: 0033:0x7f47209ae28d
2026-04-22T15:58:06.3755178Z [  339.349940] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b bb 0d 00 f7 d8 64 89 01 48
2026-04-22T15:58:06.3758736Z [  339.349944] RSP: 002b:00007ffd43505848 EFLAGS: 00000206 ORIG_RAX: 000000000000004f
2026-04-22T15:58:06.3763493Z [  339.349949] RAX: ffffffffffffffda RBX: 00007f4720f75000 RCX: 00007f47209ae28d
2026-04-22T15:58:06.3768294Z [  339.349951] RDX: 00007f472097ca9b RSI: 0000000000001000 RDI: 00007ffd43505860
2026-04-22T15:58:06.3773718Z [  339.349954] RBP: 00007ffd435068c0 R08: 0000000000000000 R09: 0000000000000000
2026-04-22T15:58:06.3778607Z [  339.349956] R10: 0000000000000064 R11: 0000000000000206 R12: 0000000000000008
2026-04-22T15:58:06.3783368Z [  339.349958] R13: 0000000000000000 R14: 00005648b9bf8550 R15: 00007f4720fb8000
2026-04-22T15:58:06.3784686Z [  339.349963]  </TASK>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw tracepoint programs
  2026-04-22 15:27 ` [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
@ 2026-04-22 21:43   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-04-22 21:43 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [Medium] Adding `bpf_prog_get_recursion_context()` to `__bpf_prog_test_run_raw_tp()` causes concurrent test runs of sleepable programs on the same CPU to silently fail and report success (returning 0) to user-space.
--

commit 222c8798357569a257ba54fe2c8b4e389e16e925
Author: Mykyta Yatsenko <yatsenko@meta.com>
Subject: bpf: Add sleepable support for raw tracepoint programs

This commit reworks __bpf_trace_run() and bpf_prog_test_run_raw_tp() to
support sleepable BPF programs by using explicit RCU flavor selection and
adding per-program recursion context guards.

> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
[ ... ]
> @@ -748,14 +748,35 @@ static void
>  __bpf_prog_test_run_raw_tp(void *data)
>  {
>  	struct bpf_raw_tp_test_run_info *info = data;
[ ... ]
> -	rcu_read_lock();
> +	if (info->prog->sleepable) {
> +		scp = rcu_read_lock_tasks_trace();
> +		migrate_disable();
> +	} else {
> +		rcu_read_lock();
> +	}
> +
> +	if (unlikely(!bpf_prog_get_recursion_context(info->prog))) {
> +		bpf_prog_inc_misses_counter(info->prog);
> +		goto out;
> +	}
> +
>  	info->retval = bpf_prog_run(info->prog, info->ctx);
> -	rcu_read_unlock();
> +
> +out:
> +	bpf_prog_put_recursion_context(info->prog);

Does this introduce a regression where concurrent test runs of sleepable
programs silently skip execution and report success?

For sleepable programs, execution can block. Even with migrate_disable() held,
the task can yield the CPU while the per-CPU counter from
bpf_prog_get_recursion_context() remains incremented.

If another userspace thread executes a test run for the same sleepable
program and is scheduled on the same CPU, it will read the incremented
counter and falsely detect recursion.

This causes the second run to jump out and skip bpf_prog_run(). Because
info->retval is initialized to 0 and no error is returned, the test run will
silently succeed and copy 0 back to userspace.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422-sleepable_tracepoints-v12-0-744bf0e3b311@meta.com?part=1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 15:27 ` [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
  2026-04-22 16:06   ` bot+bpf-ci
@ 2026-04-22 22:02   ` sashiko-bot
  1 sibling, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-04-22 22:02 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [Critical] Inverted RCU lock logic in bpf_prog_run_array_sleepable() causes use-after-free and 'sleeping in atomic' panics
--

commit bed1594e26a509b74cbf793566b3e03eca1cc31d
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Add bpf_prog_run_array_sleepable()

This commit adds a new function to run BPF program arrays on faultable
tracepoints. It includes per-program recursion checking for private stack
safety and identifies dummy placeholder programs by checking if the length
is zero.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b4b703c90ca94..c909fee369c2d 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h

[ ... ]

> @@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
>  void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
>  
> +static __always_inline u32
> +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
> +			     const void *ctx, bpf_prog_run_fn run_prog)
> +{

[ ... ]

> +		run_ctx.bpf_cookie = item->bpf_cookie;
> +
> +		if (prog->sleepable) {
> +			guard(rcu)();
> +			ret &= run_prog(prog, ctx);
> +		} else {
> +			ret &= run_prog(prog, ctx);
> +		}

Should the guard(rcu)() logic be inverted here? 

Sleepable BPF programs are permitted to block and sleep to handle page
faults, but holding the RCU read lock while sleeping is generally not
permitted and can lead to scheduling while atomic panics.

Conversely, executing non-sleepable BPF programs without the RCU read lock
could lead to use-after-free regressions, as they rely on the execution
environment holding the RCU lock to safely access RCU-protected memory and
dynamically sized maps.

Could this be changed to if (!prog->sleepable) to match the pattern used in
bpf_prog_run_array_uprobe()?

> +
> +		bpf_prog_put_recursion_context(prog);
> +		item++;
> +	}
> +	bpf_reset_run_ctx(old_run_ctx);
> +	migrate_enable();
> +	return ret;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422-sleepable_tracepoints-v12-0-744bf0e3b311@meta.com?part=2

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-22 15:27 ` [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-22 23:06   ` sashiko-bot
  0 siblings, 0 replies; 16+ messages in thread
From: sashiko-bot @ 2026-04-22 23:06 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [Critical] Inverted RCU lock logic in `bpf_prog_run_array_sleepable()` causes sleepable programs to run under `rcu_read_lock()` and non-sleepable programs to run without it.
- [High] Uninitialized kernel stack memory is leaked to BPF programs via the `param` struct in `perf_call_bpf_enter()` and `perf_call_bpf_exit()`.
--

commit 37b318a3f61844cb76121bebc8014d8031e06996
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Add sleepable support for classic tracepoint programs

This commit adds trace_call_bpf_faultable() for faultable tracepoints,
supporting sleepable bpf programs. It also restructures perf syscall hooks
to run bpf before preempt_disable.

> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 4e763dd2aa2bb..bbc0710615014 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c

[ ... ]

> +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
> +{
> +	struct bpf_prog_array *prog_array;
> +
> +	might_fault();
> +	guard(rcu_tasks_trace)();
> +
> +	prog_array = rcu_dereference_check(call->prog_array,
> +					   rcu_read_lock_trace_held());
> +	return bpf_prog_run_array_sleepable(prog_array, ctx, bpf_prog_run);
> +}

Does bpf_prog_run_array_sleepable() properly handle rcu locks for sleepable
programs?

A preceding patch introduced bpf_prog_run_array_sleepable() which appears
to have an inverted logic check:

if (prog->sleepable) {
    guard(rcu)();
    ...
}

This causes sleepable programs to execute with the rcu read lock held,
preventing them from sleeping. Conversely, non-sleepable bpf programs run
without the rcu read lock, which could lead to regressions when accessing
rcu-protected data.

> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 8ad72e17d8eb9..e98ee7e1e66f4 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -1371,33 +1371,33 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
>  static int sys_perf_refcount_enter;
>  static int sys_perf_refcount_exit;
>  
> -static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
> +static int perf_call_bpf_enter(struct trace_event_call *call,
>  			       struct syscall_metadata *sys_data,
> -			       struct syscall_trace_enter *rec)
> +			       int syscall_nr, unsigned long *args)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long args[SYSCALL_DEFINE_MAXARGS];
>  	} __aligned(8) param;
> +	struct pt_regs regs = {};

Is it possible for this code to leak uninitialized kernel stack memory to
bpf programs?

The param struct is allocated on the stack but not fully initialized. On
32-bit systems, overwriting the first few bytes only partially covers ent,
leaving ent.pid uninitialized. 

Additionally, if the system call has fewer than 6 arguments, the trailing
elements of param.args remain uninitialized and could be read by bpf
tracepoint programs.

[ ... ]

> -static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
> -			      struct syscall_trace_exit *rec)
> +static int perf_call_bpf_exit(struct trace_event_call *call,
> +			      int syscall_nr, long ret_val)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long ret;
>  	} __aligned(8) param;
> -
> -	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
> -	perf_fetch_caller_regs(regs);
> -	*(struct pt_regs **)&param = regs;
> -	param.syscall_nr = rec->nr;
> -	param.ret = rec->ret;
> -	return trace_call_bpf(call, &param);
> +	struct pt_regs regs = {};

Could this also leak uninitialized padding to bpf programs?

Similar to perf_call_bpf_enter(), param is not zero-initialized here.
On 64-bit systems, compiler-inserted padding between syscall_nr (4 bytes)
and ret (8 bytes) is left uninitialized and could also be read.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422-sleepable_tracepoints-v12-0-744bf0e3b311@meta.com?part=3

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-04-22 23:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22 15:27 [PATCH bpf-next v12 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
2026-04-22 15:27 ` [PATCH bpf-next v12 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
2026-04-22 21:43   ` sashiko-bot
2026-04-22 15:27 ` [PATCH bpf-next v12 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
2026-04-22 16:06   ` bot+bpf-ci
2026-04-22 16:36     ` Mykyta Yatsenko
2026-04-22 17:00       ` Alexei Starovoitov
2026-04-22 17:57         ` Kumar Kartikeya Dwivedi
2026-04-22 18:02           ` Kumar Kartikeya Dwivedi
2026-04-22 18:27             ` Mykyta Yatsenko
2026-04-22 22:02   ` sashiko-bot
2026-04-22 15:27 ` [PATCH bpf-next v12 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
2026-04-22 23:06   ` sashiko-bot
2026-04-22 15:27 ` [PATCH bpf-next v12 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
2026-04-22 15:27 ` [PATCH bpf-next v12 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
2026-04-22 15:27 ` [PATCH bpf-next v12 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox