public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs
@ 2026-04-22 19:41 Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

This series adds support for sleepable BPF programs attached to raw
tracepoints (tp_btf, raw_tp) and classic tracepoints (tp).
The motivation is to allow BPF programs on syscall
tracepoints to use sleepable helpers such as bpf_copy_from_user(),
enabling reliable user memory reads that can page-fault.

This series removes restriction for faultable tracepoints:

Patch 1 modifies __bpf_trace_run() to support sleepable programs.

Patch 2 introduces bpf_prog_run_array_sleepable() to support new usecase.

Patch 3 adds sleepable support for classic tracepoints
(BPF_PROG_TYPE_TRACEPOINT) by introducing trace_call_bpf_faultable()
and restructuring perf_syscall_enter/exit() to run BPF programs in
faultable context.

Patch 4 allows BPF_TRACE_RAW_TP, BPF_PROG_TYPE_RAW_TRACEPOINT, and
BPF_PROG_TYPE_TRACEPOINT programs to be loaded as sleepable, with
load-time and attach-time checks to reject sleepable programs on
non-faultable tracepoints.

Patch 5 adds libbpf SEC_DEF handlers: tp_btf.s, raw_tp.s,
raw_tracepoint.s, tp.s, and tracepoint.s.

Patch 6 adds selftests covering tp_btf.s, raw_tp.s, and tp.s positive
cases using bpf_copy_from_user() plus negative tests for non-faultable
tracepoints.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
Changes in v13:
- Invert if (prog->sleepable) check in bpf_prog_run_array_sleepable()
- Link to v12: https://patch.msgid.link/20260422-sleepable_tracepoints-v12-0-744bf0e3b311@meta.com

Changes in v12:
- Style improvement in the bpf_prog_run_array_sleepable(): use guard(rcu)(), remove
unnecessary defensive programming artifacts.
- Link to v11: https://patch.msgid.link/20260421-sleepable_tracepoints-v11-0-d8ff138d6f05@meta.com

Changes in v11:
- Avoid running dummy prog in the bpf_prog_run_array_sleepable()
- Migrate selftests from nanosleep() to getcwd() to avoid issues with the
different struct layouts.
- Link to v10: https://patch.msgid.link/20260415-sleepable_tracepoints-v10-0-161f40b33dd7@meta.com

Changes in v10:
- Guard per-prog recursion check in bpf_prog_run_array_sleepable()
  with prog->active NULL check, following the same pattern as
  commit 7dc211c1159d for prog->stats. dummy_bpf_prog has NULL
  active field and can appear in the array via
  bpf_prog_array_delete_safe() fallback on allocation failure.
- Link to v9: https://patch.msgid.link/20260410-sleepable_tracepoints-v9-0-e719e664e84c@meta.com

Changes in v9:
- Fixed "classic raw tracepoints" to "raw tracepoints (tp_btf, raw_tp)"
in commit message
- Added bpf_prog_get_recursion_context() guard to
__bpf_prog_test_run_raw_tp() to protect per-CPU private stack from
concurrent sleepable test runs
- Added new bpf_prog_run_array_sleepable() without is_uprobe parameter,
remove all changes in bpf_prog_run_array_uprobe()
- Refactored attach_tp() to use prefix array uniformly (matching
attach_raw_tp() pattern), removing hardcoded strcmp() bare-name checks.
- Recursion check in __bpf_prog_test_run_raw_tp()
- Refactored selftests
- Link to v8: https://patch.msgid.link/20260330-sleepable_tracepoints-v8-0-2e323467f3a0@meta.com

Changes in v8:
- Fix sleepable tracepoint support in bpf_prog_test_run() (Kumar, sashiko)
- Link to v7: https://patch.msgid.link/20260325-sleepable_tracepoints-v6-0-2b182dacea13@meta.com

Changes in v7:
- Add recursion check (bpf_prog_get_recursion_context()) to make sure
private stack is safe when sleepable program is preempted by itself
(Alexei, Kumar)
- Use combined rcu_read_lock_dont_migrate() instead of separate
rcu_read_lock()/migrate_disable() calls for non-sleepable path (Alexei)
- Link to v6: https://lore.kernel.org/bpf/20260324-sleepable_tracepoints-v6-0-81bab3a43f25@meta.com/

Changes in v6:
- Remove recursion check from trace_call_bpf_faultable(), sleepable
tracepoints are called from syscall enter/exit, no recursion is
possible.(Kumar)
- Refactor bpf_prog_run_array_uprobe() to support tracepoints
usecase cleanly (Kumar)
- Link to v5: https://lore.kernel.org/r/20260316-sleepable_tracepoints-v5-0-85525de71d25@meta.com

Changes in v5:
- Addressed AI review: zero initialize struct pt_regs in
perf_call_bpf_enter(); changed handling tp.s and tracepoint.s in
attach_tp() in libbpf.
- Updated commit messages
- Link to v4: https://lore.kernel.org/r/20260313-sleepable_tracepoints-v4-0-debc688a66b3@meta.com

Changes in v4:
- Follow uprobe_prog_run() pattern with explicit rcu_read_lock_trace()
  instead of relying on outer rcu_tasks_trace lock
- Add sleepable support for classic raw tracepoints (raw_tp.s)
- Add sleepable support for classic tracepoints (tp.s) with new
  trace_call_bpf_faultable() and restructured perf_syscall_enter/exit()
- Add raw_tp.s, raw_tracepoint.s, tp.s, tracepoint.s SEC_DEF handlers
- Replace growing type enumeration in error message with generic
  "program of this type cannot be sleepable"
- Use PT_REGS_PARM1_SYSCALL (non-CO-RE) in BTF test
- Add classic raw_tp and classic tracepoint sleepable tests
- Link to v3: https://lore.kernel.org/r/20260311-sleepable_tracepoints-v3-0-3e9bbde5bd22@meta.com

Changes in v3:
  - Moved faultable tracepoint check from attach time to load time in
    bpf_check_attach_target(), providing a clear verifier error message
  - Folded preempt_disable removal into the sleepable execution path
    patch
  - Used RUN_TESTS() with __failure/__msg for negative test case instead
    of explicit userspace program
  - Reduced series from 6 patches to 4
  - Link to v2: https://lore.kernel.org/r/20260225-sleepable_tracepoints-v2-0-0330dafd650f@meta.com

Changes in v2:
  - Address AI review points - modified the order of the patches
  - Link to v1: https://lore.kernel.org/bpf/20260218-sleepable_tracepoints-v1-0-ec2705497208@meta.com/

---
Mykyta Yatsenko (6):
      bpf: Add sleepable support for raw tracepoint programs
      bpf: Add bpf_prog_run_array_sleepable()
      bpf: Add sleepable support for classic tracepoint programs
      bpf: Verifier support for sleepable tracepoint programs
      libbpf: Add section handlers for sleepable tracepoints
      selftests/bpf: Add tests for sleepable tracepoint programs

 include/linux/bpf.h                                |  50 ++++++++
 include/linux/trace_events.h                       |   6 +
 include/trace/bpf_probe.h                          |   2 -
 kernel/bpf/syscall.c                               |   5 +
 kernel/bpf/verifier.c                              |  13 +-
 kernel/events/core.c                               |   9 ++
 kernel/trace/bpf_trace.c                           |  48 ++++++-
 kernel/trace/trace_syscalls.c                      | 110 ++++++++--------
 net/bpf/test_run.c                                 |  65 +++++++---
 tools/lib/bpf/libbpf.c                             |  88 ++++++++-----
 .../bpf/prog_tests/sleepable_tracepoints.c         | 142 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 112 ++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 14 files changed, 578 insertions(+), 107 deletions(-)
---
base-commit: 05103382104b8ffc701f7b2c79379a0361ecad30
change-id: 20260216-sleepable_tracepoints-381ae1410550

Best regards,
--  
Mykyta Yatsenko <yatsenko@meta.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 1/6] bpf: Add sleepable support for raw tracepoint programs
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Rework __bpf_trace_run() to support sleepable BPF programs by using
explicit RCU flavor selection, following the uprobe_prog_run() pattern.

For sleepable programs, use rcu_read_lock_tasks_trace() for lifetime
protection with migrate_disable(). For non-sleepable programs, use the
regular rcu_read_lock_dont_migrate().

Remove the preempt_disable_notrace/preempt_enable_notrace pair from
the faultable tracepoint BPF probe wrapper in bpf_probe.h, since
migration protection and RCU locking are now handled per-program
inside __bpf_trace_run().

Adapt bpf_prog_test_run_raw_tp() for sleepable programs: reject
BPF_F_TEST_RUN_ON_CPU since sleepable programs cannot run in hardirq
or preempt-disabled context, and call __bpf_prog_test_run_raw_tp()
directly instead of via smp_call_function_single(). Rework
__bpf_prog_test_run_raw_tp() to select RCU flavor per-program and
add per-program recursion context guard for private stack safety.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/trace/bpf_probe.h |  2 --
 kernel/trace/bpf_trace.c  | 20 ++++++++++++---
 net/bpf/test_run.c        | 65 ++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 9391d54d3f12..d1de8f9aa07f 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -58,9 +58,7 @@ static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
 	might_fault();							\
-	preempt_disable_notrace();					\
 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
-	preempt_enable_notrace();					\
 }
 
 #undef DECLARE_EVENT_SYSCALL_CLASS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e916f0ccbed9..7276c72c1d31 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2072,11 +2072,19 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 static __always_inline
 void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_prog *prog = link->link.prog;
+	bool sleepable = prog->sleepable;
 	struct bpf_run_ctx *old_run_ctx;
 	struct bpf_trace_run_ctx run_ctx;
 
-	rcu_read_lock_dont_migrate();
+	if (sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock_dont_migrate();
+	}
+
 	if (unlikely(!bpf_prog_get_recursion_context(prog))) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
@@ -2085,12 +2093,18 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 	run_ctx.bpf_cookie = link->cookie;
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	(void) bpf_prog_run(prog, args);
+	(void)bpf_prog_run(prog, args);
 
 	bpf_reset_run_ctx(old_run_ctx);
 out:
 	bpf_prog_put_recursion_context(prog);
-	rcu_read_unlock_migrate();
+
+	if (sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock_migrate();
+	}
 }
 
 #define UNPACK(...)			__VA_ARGS__
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2bc04feadfab..c9aea7052ba7 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -748,14 +748,35 @@ static void
 __bpf_prog_test_run_raw_tp(void *data)
 {
 	struct bpf_raw_tp_test_run_info *info = data;
+	struct srcu_ctr __percpu *scp = NULL;
 	struct bpf_trace_run_ctx run_ctx = {};
 	struct bpf_run_ctx *old_run_ctx;
 
 	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 
-	rcu_read_lock();
+	if (info->prog->sleepable) {
+		scp = rcu_read_lock_tasks_trace();
+		migrate_disable();
+	} else {
+		rcu_read_lock();
+	}
+
+	if (unlikely(!bpf_prog_get_recursion_context(info->prog))) {
+		bpf_prog_inc_misses_counter(info->prog);
+		goto out;
+	}
+
 	info->retval = bpf_prog_run(info->prog, info->ctx);
-	rcu_read_unlock();
+
+out:
+	bpf_prog_put_recursion_context(info->prog);
+
+	if (info->prog->sleepable) {
+		migrate_enable();
+		rcu_read_unlock_tasks_trace(scp);
+	} else {
+		rcu_read_unlock();
+	}
 
 	bpf_reset_run_ctx(old_run_ctx);
 }
@@ -783,6 +804,13 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 && cpu != 0)
 		return -EINVAL;
 
+	/*
+	 * Sleepable programs cannot run with preemption disabled or in
+	 * hardirq context (smp_call_function_single), reject the flag.
+	 */
+	if (prog->sleepable && (kattr->test.flags & BPF_F_TEST_RUN_ON_CPU))
+		return -EINVAL;
+
 	if (ctx_size_in) {
 		info.ctx = memdup_user(ctx_in, ctx_size_in);
 		if (IS_ERR(info.ctx))
@@ -791,24 +819,31 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 		info.ctx = NULL;
 	}
 
+	info.retval = 0;
 	info.prog = prog;
 
-	current_cpu = get_cpu();
-	if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
-	    cpu == current_cpu) {
+	if (prog->sleepable) {
 		__bpf_prog_test_run_raw_tp(&info);
-	} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
-		/* smp_call_function_single() also checks cpu_online()
-		 * after csd_lock(). However, since cpu is from user
-		 * space, let's do an extra quick check to filter out
-		 * invalid value before smp_call_function_single().
-		 */
-		err = -ENXIO;
 	} else {
-		err = smp_call_function_single(cpu, __bpf_prog_test_run_raw_tp,
-					       &info, 1);
+		current_cpu = get_cpu();
+		if ((kattr->test.flags & BPF_F_TEST_RUN_ON_CPU) == 0 ||
+		    cpu == current_cpu) {
+			__bpf_prog_test_run_raw_tp(&info);
+		} else if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+			/*
+			 * smp_call_function_single() also checks cpu_online()
+			 * after csd_lock(). However, since cpu is from user
+			 * space, let's do an extra quick check to filter out
+			 * invalid value before smp_call_function_single().
+			 */
+			err = -ENXIO;
+		} else {
+			err = smp_call_function_single(cpu,
+						       __bpf_prog_test_run_raw_tp,
+						       &info, 1);
+		}
+		put_cpu();
 	}
-	put_cpu();
 
 	if (!err &&
 	    copy_to_user(&uattr->test.retval, &info.retval, sizeof(u32)))

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 2/6] bpf: Add bpf_prog_run_array_sleepable()
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add bpf_prog_run_array_sleepable() for running BPF program arrays
on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it
includes per-program recursion checking for private stack safety
and hardcodes is_uprobe to false.

Skip dummy_bpf_prog at the top of the loop. When
bpf_prog_array_delete_safe() replaces a detached program with
dummy_bpf_prog on allocation failure, the dummy is statically
allocated and has NULL active, stats, and aux fields. Identify
it by prog->len == 0, since every real program has at least one
instruction.

Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..d3aea3931b85 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
 void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
 
+static __always_inline u32
+bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
+			     const void *ctx, bpf_prog_run_fn run_prog)
+{
+	const struct bpf_prog_array_item *item;
+	struct bpf_prog *prog;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
+	u32 ret = 1;
+
+	if (unlikely(!array))
+		return ret;
+
+	migrate_disable();
+
+	run_ctx.is_uprobe = false;
+
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+	item = &array->items[0];
+	while ((prog = READ_ONCE(item->prog))) {
+		/* Skip dummy_bpf_prog placeholder (len == 0) */
+		if (unlikely(!prog->len)) {
+			item++;
+			continue;
+		}
+
+		if (unlikely(!bpf_prog_get_recursion_context(prog))) {
+			bpf_prog_inc_misses_counter(prog);
+			bpf_prog_put_recursion_context(prog);
+			item++;
+			continue;
+		}
+
+		run_ctx.bpf_cookie = item->bpf_cookie;
+
+		if (!prog->sleepable) {
+			guard(rcu)();
+			ret &= run_prog(prog, ctx);
+		} else {
+			ret &= run_prog(prog, ctx);
+		}
+
+		bpf_prog_put_recursion_context(prog);
+		item++;
+	}
+	bpf_reset_run_ctx(old_run_ctx);
+	migrate_enable();
+	return ret;
+}
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 20:57   ` bot+bpf-ci
  2026-04-22 23:35   ` sashiko-bot
  2026-04-22 19:41 ` [PATCH bpf-next v13 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
faultable tracepoints that supports sleepable BPF programs. It uses
rcu_tasks_trace for lifetime protection and
bpf_prog_run_array_sleepable() for per-program RCU flavor selection,
following the uprobe_prog_run() pattern.

Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
programs before perf event processing. Previously, BPF ran after the
per-cpu perf trace buffer was allocated under preempt_disable,
requiring cleanup via perf_swevent_put_recursion_context() on filter.
Now BPF runs in faultable context before preempt_disable, reading
syscall arguments from local variables instead of the per-cpu trace
record, removing the dependency on buffer allocation. This allows
sleepable BPF programs to execute and avoids unnecessary buffer
allocation when BPF filters the event. The perf event submission
path (buffer allocation, fill, submit) remains under preempt_disable
as before. Since BPF no longer runs within the buffer allocation
context, the fake_regs output parameter to perf_trace_buf_alloc()
is no longer needed and is replaced with NULL.

Add an attach-time check in __perf_event_set_bpf_prog() to reject
sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
tracepoints, since only syscall tracepoints run in faultable context.

This prepares the classic tracepoint runtime and attach paths for
sleepable programs. The verifier changes to allow loading sleepable
BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.

To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> # for BPF bits
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 include/linux/trace_events.h  |   6 +++
 kernel/events/core.c          |   9 ++++
 kernel/trace/bpf_trace.c      |  28 +++++++++++
 kernel/trace/trace_syscalls.c | 110 ++++++++++++++++++++++--------------------
 4 files changed, 101 insertions(+), 52 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 40a43a4c7caf..d49338c44014 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -770,6 +770,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
 
 #ifdef CONFIG_BPF_EVENTS
 unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
@@ -792,6 +793,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
 	return 1;
 }
 
+static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	return 1;
+}
+
 static inline int
 perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
 {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6d1f8bad7e1c..0f9cacfa7cb8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11643,6 +11643,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
 		/* only uprobe programs are allowed to be sleepable */
 		return -EINVAL;
 
+	if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
+		/*
+		 * Sleepable tracepoint programs can only attach to faultable
+		 * tracepoints. Currently only syscall tracepoints are faultable.
+		 */
+		if (!is_syscall_tp)
+			return -EINVAL;
+	}
+
 	/* Kprobe override only works for kprobes, not uprobes. */
 	if (prog->kprobe_override && !is_kprobe)
 		return -EINVAL;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7276c72c1d31..a822c589c9bd 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -152,6 +152,34 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 	return ret;
 }
 
+/**
+ * trace_call_bpf_faultable - invoke BPF program in faultable context
+ * @call: tracepoint event
+ * @ctx: opaque context pointer
+ *
+ * Variant of trace_call_bpf() for faultable tracepoints (syscall
+ * tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
+ * for lifetime protection and bpf_prog_run_array_sleepable() for per-program
+ * RCU flavor selection, following the uprobe pattern.
+ *
+ * Per-program recursion protection is provided by
+ * bpf_prog_run_array_sleepable(). Global bpf_prog_active is not
+ * needed because syscall tracepoints cannot self-recurse.
+ *
+ * Must be called from a faultable/preemptible context.
+ */
+unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
+{
+	struct bpf_prog_array *prog_array;
+
+	might_fault();
+	guard(rcu_tasks_trace)();
+
+	prog_array = rcu_dereference_check(call->prog_array,
+					   rcu_read_lock_trace_held());
+	return bpf_prog_run_array_sleepable(prog_array, ctx, bpf_prog_run);
+}
+
 #ifdef CONFIG_BPF_KPROBE_OVERRIDE
 BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
 {
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8ad72e17d8eb..e98ee7e1e66f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -1371,33 +1371,33 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
-static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
+static int perf_call_bpf_enter(struct trace_event_call *call,
 			       struct syscall_metadata *sys_data,
-			       struct syscall_trace_enter *rec)
+			       int syscall_nr, unsigned long *args)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long args[SYSCALL_DEFINE_MAXARGS];
 	} __aligned(8) param;
+	struct pt_regs regs = {};
 	int i;
 
 	BUILD_BUG_ON(sizeof(param.ent) < sizeof(void *));
 
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
 	for (i = 0; i < sys_data->nb_args; i++)
-		param.args[i] = rec->args[i];
-	return trace_call_bpf(call, &param);
+		param.args[i] = args[i];
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_enter *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	unsigned long args[6];
 	bool valid_prog_array;
@@ -1410,12 +1410,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	int size = 0;
 	int uargs = 0;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1429,6 +1424,26 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 
 	syscall_get_arguments(current, regs, args);
 
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
+	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
+	if (valid_prog_array &&
+	    !perf_call_bpf_enter(sys_data->enter_event, sys_data,
+				 syscall_nr, args))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->enter_event->perf_events);
+	if (hlist_empty(head))
+		return;
+
 	/* Check if this syscall event faults in user space memory */
 	mayfault = sys_data->user_mask != 0;
 
@@ -1438,17 +1453,12 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 			return;
 	}
 
-	head = this_cpu_ptr(sys_data->enter_event->perf_events);
-	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
-	if (!valid_prog_array && hlist_empty(head))
-		return;
-
 	/* get the size after alignment with the u32 buffer size field */
 	size += sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
@@ -1458,13 +1468,6 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	if (mayfault)
 		syscall_put_data(sys_data, rec, user_ptr, size, user_sizes, uargs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx,
 			      sys_data->enter_event->event.type, 1, regs,
 			      head, NULL);
@@ -1514,40 +1517,35 @@ static void perf_sysenter_disable(struct trace_event_call *call)
 		syscall_fault_buffer_disable();
 }
 
-static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
-			      struct syscall_trace_exit *rec)
+static int perf_call_bpf_exit(struct trace_event_call *call,
+			      int syscall_nr, long ret_val)
 {
 	struct syscall_tp_t {
 		struct trace_entry ent;
 		int syscall_nr;
 		unsigned long ret;
 	} __aligned(8) param;
-
-	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
-	perf_fetch_caller_regs(regs);
-	*(struct pt_regs **)&param = regs;
-	param.syscall_nr = rec->nr;
-	param.ret = rec->ret;
-	return trace_call_bpf(call, &param);
+	struct pt_regs regs = {};
+
+	/* bpf prog requires 'regs' to be the first member in the ctx */
+	perf_fetch_caller_regs(&regs);
+	*(struct pt_regs **)&param = &regs;
+	param.syscall_nr = syscall_nr;
+	param.ret = ret_val;
+	return trace_call_bpf_faultable(call, &param);
 }
 
 static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_exit *rec;
-	struct pt_regs *fake_regs;
 	struct hlist_head *head;
 	bool valid_prog_array;
 	int syscall_nr;
 	int rctx;
 	int size;
 
-	/*
-	 * Syscall probe called with preemption enabled, but the ring
-	 * buffer and per-cpu data require preemption to be disabled.
-	 */
 	might_fault();
-	guard(preempt_notrace)();
 
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
@@ -1559,29 +1557,37 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	/*
+	 * Run BPF program in faultable context before per-cpu buffer
+	 * allocation, allowing sleepable BPF programs to execute.
+	 */
 	valid_prog_array = bpf_prog_array_valid(sys_data->exit_event);
-	if (!valid_prog_array && hlist_empty(head))
+	if (valid_prog_array &&
+	    !perf_call_bpf_exit(sys_data->exit_event, syscall_nr,
+				syscall_get_return_value(current, regs)))
+		return;
+
+	/*
+	 * Per-cpu ring buffer and perf event list operations require
+	 * preemption to be disabled.
+	 */
+	guard(preempt_notrace)();
+
+	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	if (hlist_empty(head))
 		return;
 
 	/* We can probably do that at build time */
 	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	rec = perf_trace_buf_alloc(size, &fake_regs, &rctx);
+	rec = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!rec)
 		return;
 
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	if ((valid_prog_array &&
-	     !perf_call_bpf_exit(sys_data->exit_event, fake_regs, rec)) ||
-	    hlist_empty(head)) {
-		perf_swevent_put_recursion_context(rctx);
-		return;
-	}
-
 	perf_trace_buf_submit(rec, size, rctx, sys_data->exit_event->event.type,
 			      1, regs, head, NULL);
 }

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 4/6] bpf: Verifier support for sleepable tracepoint programs
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (2 preceding siblings ...)
  2026-04-22 19:41 ` [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Allow BPF_PROG_TYPE_RAW_TRACEPOINT, BPF_PROG_TYPE_TRACEPOINT, and
BPF_TRACE_RAW_TP (tp_btf) programs to be sleepable by adding them
to can_be_sleepable().

For BTF-based raw tracepoints (tp_btf), add a load-time check in
bpf_check_attach_target() that rejects sleepable programs attaching
to non-faultable tracepoints with a descriptive error message.

For raw tracepoints (raw_tp), add an attach-time check in
bpf_raw_tp_link_attach() that rejects sleepable programs on
non-faultable tracepoints. The attach-time check is needed because
the tracepoint name is not known at load time for raw_tp.

The attach-time check for classic tracepoints (tp) in
__perf_event_set_bpf_prog() was added in the previous patch.

Replace the verbose error message that enumerates allowed program
types with a generic "Program of this type cannot be sleepable"
message, since the list of sleepable-capable types keeps growing.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 kernel/bpf/syscall.c  |  5 +++++
 kernel/bpf/verifier.c | 13 +++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a3c0214ca934..3b1f0ba02f61 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4281,6 +4281,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 	if (!btp)
 		return -ENOENT;
 
+	if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+		bpf_put_raw_tracepoint(btp);
+		return -EINVAL;
+	}
+
 	link = kzalloc_obj(*link, GFP_USER);
 	if (!link) {
 		err = -ENOMEM;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 185210b73385..5b4806fdb648 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19267,6 +19267,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		btp = bpf_get_raw_tracepoint(tname);
 		if (!btp)
 			return -EINVAL;
+		if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
+			bpf_log(log, "Sleepable program cannot attach to non-faultable tracepoint %s\n",
+				tname);
+			bpf_put_raw_tracepoint(btp);
+			return -EINVAL;
+		}
 		fname = kallsyms_lookup((unsigned long)btp->bpf_func, NULL, NULL, NULL,
 					trace_symbol);
 		bpf_put_raw_tracepoint(btp);
@@ -19483,6 +19489,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 		case BPF_MODIFY_RETURN:
 		case BPF_TRACE_ITER:
 		case BPF_TRACE_FSESSION:
+		case BPF_TRACE_RAW_TP:
 			return true;
 		default:
 			return false;
@@ -19490,7 +19497,9 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 	}
 	return prog->type == BPF_PROG_TYPE_LSM ||
 	       prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ ||
-	       prog->type == BPF_PROG_TYPE_STRUCT_OPS;
+	       prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
+	       prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT ||
+	       prog->type == BPF_PROG_TYPE_TRACEPOINT;
 }
 
 static int check_attach_btf_id(struct bpf_verifier_env *env)
@@ -19512,7 +19521,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	}
 
 	if (prog->sleepable && !can_be_sleepable(prog)) {
-		verbose(env, "Only fentry/fexit/fsession/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n");
+		verbose(env, "Program of this type cannot be sleepable\n");
 		return -EINVAL;
 	}
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 5/6] libbpf: Add section handlers for sleepable tracepoints
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (3 preceding siblings ...)
  2026-04-22 19:41 ` [PATCH bpf-next v13 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 19:41 ` [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  2026-04-22 21:00 ` [PATCH bpf-next v13 0/6] bpf: Add support " patchwork-bot+netdevbpf
  6 siblings, 0 replies; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Add SEC_DEF entries for sleepable tracepoint variants:
  - "tp_btf.s+"     for sleepable BTF-based raw tracepoints
  - "raw_tp.s+"     for sleepable raw tracepoints
  - "raw_tracepoint.s+" (alias)
  - "tp.s+"         for sleepable classic tracepoints
  - "tracepoint.s+" (alias)

Extract sec_name_match_prefix() to share the prefix matching logic
between attach_tp() and attach_raw_tp(), eliminating duplicated
loops and hardcoded strcmp() checks for bare section names.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 tools/lib/bpf/libbpf.c | 88 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 31 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 83aae7a39d36..ab2071fdd3e8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10018,11 +10018,16 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("netkit/peer",		SCHED_CLS, BPF_NETKIT_PEER, SEC_NONE),
 	SEC_DEF("tracepoint+",		TRACEPOINT, 0, SEC_NONE, attach_tp),
 	SEC_DEF("tp+",			TRACEPOINT, 0, SEC_NONE, attach_tp),
+	SEC_DEF("tracepoint.s+",	TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
+	SEC_DEF("tp.s+",		TRACEPOINT, 0, SEC_SLEEPABLE, attach_tp),
 	SEC_DEF("raw_tracepoint+",	RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp+",		RAW_TRACEPOINT, 0, SEC_NONE, attach_raw_tp),
+	SEC_DEF("raw_tracepoint.s+",	RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
+	SEC_DEF("raw_tp.s+",		RAW_TRACEPOINT, 0, SEC_SLEEPABLE, attach_raw_tp),
 	SEC_DEF("raw_tracepoint.w+",	RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("raw_tp.w+",		RAW_TRACEPOINT_WRITABLE, 0, SEC_NONE, attach_raw_tp),
 	SEC_DEF("tp_btf+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF, attach_trace),
+	SEC_DEF("tp_btf.s+",		TRACING, BPF_TRACE_RAW_TP, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fentry+",		TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fmod_ret+",		TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("fexit+",		TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF, attach_trace),
@@ -13152,25 +13157,61 @@ struct bpf_link *bpf_program__attach_tracepoint(const struct bpf_program *prog,
 	return bpf_program__attach_tracepoint_opts(prog, tp_category, tp_name, NULL);
 }
 
+/*
+ * Match section name against a prefix array. Returns pointer past
+ * "prefix/" on match, empty string for bare sections (exact prefix
+ * match), or NULL if no prefix matches.
+ */
+static const char *sec_name_match_prefix(const char *sec_name,
+					 const char *const *prefixes,
+					 size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		size_t pfx_len;
+
+		if (!str_has_pfx(sec_name, prefixes[i]))
+			continue;
+
+		pfx_len = strlen(prefixes[i]);
+		if (sec_name[pfx_len] == '\0')
+			return sec_name + pfx_len;
+
+		if (sec_name[pfx_len] != '/' || sec_name[pfx_len + 1] == '\0')
+			continue;
+
+		return sec_name + pfx_len + 1;
+	}
+	return NULL;
+}
+
 static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link)
 {
+	static const char *const prefixes[] = {
+		"tp.s",
+		"tp",
+		"tracepoint.s",
+		"tracepoint",
+	};
 	char *sec_name, *tp_cat, *tp_name;
+	const char *match;
 
 	*link = NULL;
 
-	/* no auto-attach for SEC("tp") or SEC("tracepoint") */
-	if (strcmp(prog->sec_name, "tp") == 0 || strcmp(prog->sec_name, "tracepoint") == 0)
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
+		return -EINVAL;
+	}
+	if (!match[0]) /* bare section name no autoattach */
 		return 0;
 
 	sec_name = strdup(prog->sec_name);
 	if (!sec_name)
 		return -ENOMEM;
 
-	/* extract "tp/<category>/<name>" or "tracepoint/<category>/<name>" */
-	if (str_has_pfx(prog->sec_name, "tp/"))
-		tp_cat = sec_name + sizeof("tp/") - 1;
-	else
-		tp_cat = sec_name + sizeof("tracepoint/") - 1;
+	tp_cat = sec_name + (match - prog->sec_name);
 	tp_name = strchr(tp_cat, '/');
 	if (!tp_name) {
 		free(sec_name);
@@ -13234,37 +13275,22 @@ static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf
 		"raw_tracepoint",
 		"raw_tp.w",
 		"raw_tracepoint.w",
+		"raw_tp.s",
+		"raw_tracepoint.s",
 	};
-	size_t i;
-	const char *tp_name = NULL;
+	const char *match;
 
 	*link = NULL;
 
-	for (i = 0; i < ARRAY_SIZE(prefixes); i++) {
-		size_t pfx_len;
-
-		if (!str_has_pfx(prog->sec_name, prefixes[i]))
-			continue;
-
-		pfx_len = strlen(prefixes[i]);
-		/* no auto-attach case of, e.g., SEC("raw_tp") */
-		if (prog->sec_name[pfx_len] == '\0')
-			return 0;
-
-		if (prog->sec_name[pfx_len] != '/')
-			continue;
-
-		tp_name = prog->sec_name + pfx_len + 1;
-		break;
-	}
-
-	if (!tp_name) {
-		pr_warn("prog '%s': invalid section name '%s'\n",
-			prog->name, prog->sec_name);
+	match = sec_name_match_prefix(prog->sec_name, prefixes, ARRAY_SIZE(prefixes));
+	if (!match) {
+		pr_warn("prog '%s': invalid section name '%s'\n", prog->name, prog->sec_name);
 		return -EINVAL;
 	}
+	if (!match[0])
+		return 0;
 
-	*link = bpf_program__attach_raw_tracepoint(prog, tp_name);
+	*link = bpf_program__attach_raw_tracepoint(prog, match);
 	return libbpf_get_error(*link);
 }
 

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (4 preceding siblings ...)
  2026-04-22 19:41 ` [PATCH bpf-next v13 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
@ 2026-04-22 19:41 ` Mykyta Yatsenko
  2026-04-22 20:30   ` bot+bpf-ci
  2026-04-22 21:00 ` [PATCH bpf-next v13 0/6] bpf: Add support " patchwork-bot+netdevbpf
  6 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 19:41 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt
  Cc: Mykyta Yatsenko

From: Mykyta Yatsenko <yatsenko@meta.com>

Cover all three sleepable tracepoint types (tp_btf.s, raw_tp.s, tp.s)
and sys_exit (via bpf_task_pt_regs) with functional tests using
bpf_copy_from_user() on getcwd. Verify alias and bare SEC variants,
bpf_prog_test_run_raw_tp() with BPF_F_TEST_RUN_ON_CPU rejection,
attach-time rejection on non-faultable tracepoints, and load-time
rejection for sleepable tp_btf on non-faultable tracepoints.

Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
---
 .../bpf/prog_tests/sleepable_tracepoints.c         | 142 +++++++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints.c         | 112 ++++++++++++++++
 .../bpf/progs/test_sleepable_tracepoints_fail.c    |  18 +++
 tools/testing/selftests/bpf/verifier/sleepable.c   |  17 ++-
 4 files changed, 287 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
new file mode 100644
index 000000000000..19500b785ee3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <test_progs.h>
+#include <unistd.h>
+#include "test_sleepable_tracepoints.skel.h"
+#include "test_sleepable_tracepoints_fail.skel.h"
+
+static void run_test(struct test_sleepable_tracepoints *skel)
+{
+	char buf[PATH_MAX] = "/";
+
+	skel->bss->target_pid = getpid();
+	skel->bss->prog_triggered = 0;
+	skel->bss->err = 0;
+	skel->bss->copied_byte = 0;
+
+	syscall(__NR_getcwd, buf, sizeof(buf));
+
+	ASSERT_EQ(skel->bss->prog_triggered, 1, "prog_triggered");
+	ASSERT_EQ(skel->bss->err, 0, "err");
+	ASSERT_EQ(skel->bss->copied_byte, '/', "copied_byte");
+}
+
+static void run_auto_attach_test(struct bpf_program *prog,
+				 struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		return;
+
+	run_test(skel);
+	bpf_link__destroy(link);
+}
+
+static void test_attach_only(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_attach_reject(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_ERR_PTR(link, "attach_should_fail"))
+		bpf_link__destroy(link);
+}
+
+static void test_raw_tp_bare(struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach_raw_tracepoint(skel->progs.handle_raw_tp_bare,
+						  "sys_enter");
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_tp_bare(struct test_sleepable_tracepoints *skel)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach_tracepoint(skel->progs.handle_tp_bare,
+					      "syscalls", "sys_enter_getcwd");
+	if (ASSERT_OK_PTR(link, "attach"))
+		bpf_link__destroy(link);
+}
+
+static void test_test_run(struct test_sleepable_tracepoints *skel)
+{
+	__u64 args[2] = {0x1234ULL, 0x5678ULL};
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.ctx_in = args,
+		.ctx_size_in = sizeof(args),
+	);
+	int fd, err;
+
+	fd = bpf_program__fd(skel->progs.handle_test_run);
+	err = bpf_prog_test_run_opts(fd, &topts);
+	ASSERT_OK(err, "test_run");
+	ASSERT_EQ(topts.retval, args[0] + args[1], "test_run_retval");
+}
+
+static void test_test_run_on_cpu_reject(struct test_sleepable_tracepoints *skel)
+{
+	__u64 args[2] = {};
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.ctx_in = args,
+		.ctx_size_in = sizeof(args),
+		.flags = BPF_F_TEST_RUN_ON_CPU,
+	);
+	int fd, err;
+
+	fd = bpf_program__fd(skel->progs.handle_test_run);
+	err = bpf_prog_test_run_opts(fd, &topts);
+	ASSERT_ERR(err, "test_run_on_cpu_reject");
+}
+
+void test_sleepable_tracepoints(void)
+{
+	struct test_sleepable_tracepoints *skel;
+
+	skel = test_sleepable_tracepoints__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	if (test__start_subtest("tp_btf"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_tp_btf, skel);
+	if (test__start_subtest("raw_tp"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_raw_tp, skel);
+	if (test__start_subtest("tracepoint"))
+		run_auto_attach_test(skel->progs.handle_sys_enter_tp, skel);
+	if (test__start_subtest("sys_exit"))
+		run_auto_attach_test(skel->progs.handle_sys_exit_tp, skel);
+	if (test__start_subtest("tracepoint_alias"))
+		test_attach_only(skel->progs.handle_sys_enter_tp_alias);
+	if (test__start_subtest("raw_tracepoint_alias"))
+		test_attach_only(skel->progs.handle_sys_enter_raw_tp_alias);
+	if (test__start_subtest("raw_tp_bare"))
+		test_raw_tp_bare(skel);
+	if (test__start_subtest("tp_bare"))
+		test_tp_bare(skel);
+	if (test__start_subtest("test_run"))
+		test_test_run(skel);
+	if (test__start_subtest("test_run_on_cpu_reject"))
+		test_test_run_on_cpu_reject(skel);
+	if (test__start_subtest("raw_tp_non_faultable"))
+		test_attach_reject(skel->progs.handle_raw_tp_non_faultable);
+	if (test__start_subtest("tp_non_syscall"))
+		test_attach_reject(skel->progs.handle_tp_non_syscall);
+	if (test__start_subtest("tp_btf_non_faultable_reject"))
+		RUN_TESTS(test_sleepable_tracepoints_fail);
+
+	test_sleepable_tracepoints__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
new file mode 100644
index 000000000000..254f7fd895d9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <asm/unistd.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+int target_pid;
+int prog_triggered;
+long err;
+char copied_byte;
+
+static int copy_getcwd_arg(char *ubuf)
+{
+	err = bpf_copy_from_user(&copied_byte, sizeof(copied_byte), ubuf);
+	if (err)
+		return err;
+
+	prog_triggered = 1;
+	return 0;
+}
+
+SEC("tp_btf.s/sys_enter")
+int BPF_PROG(handle_sys_enter_tp_btf, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_getcwd)
+		return 0;
+
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_SYSCALL(regs));
+}
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp, struct pt_regs *regs, long id)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid ||
+	    id != __NR_getcwd)
+		return 0;
+
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+SEC("tp.s/syscalls/sys_enter_getcwd")
+int handle_sys_enter_tp(struct syscall_trace_enter *args)
+{
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	return copy_getcwd_arg((void *)args->args[0]);
+}
+
+SEC("tp.s/syscalls/sys_exit_getcwd")
+int handle_sys_exit_tp(struct syscall_trace_exit *args)
+{
+	struct pt_regs *regs;
+
+	if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+		return 0;
+
+	regs = (struct pt_regs *)bpf_task_pt_regs(bpf_get_current_task_btf());
+	return copy_getcwd_arg((void *)PT_REGS_PARM1_CORE_SYSCALL(regs));
+}
+
+SEC("raw_tp.s")
+int BPF_PROG(handle_raw_tp_bare, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+SEC("tp.s")
+int handle_tp_bare(void *ctx)
+{
+	return 0;
+}
+
+SEC("tracepoint.s/syscalls/sys_enter_getcwd")
+int handle_sys_enter_tp_alias(struct syscall_trace_enter *args)
+{
+	return 0;
+}
+
+SEC("raw_tracepoint.s/sys_enter")
+int BPF_PROG(handle_sys_enter_raw_tp_alias, struct pt_regs *regs, long id)
+{
+	return 0;
+}
+
+SEC("raw_tp.s/sys_enter")
+int BPF_PROG(handle_test_run, struct pt_regs *regs, long id)
+{
+	if ((__u64)regs == 0x1234ULL && (__u64)id == 0x5678ULL)
+		return (__u64)regs + (__u64)id;
+
+	return 0;
+}
+
+SEC("raw_tp.s/sched_switch")
+int BPF_PROG(handle_raw_tp_non_faultable, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
+
+SEC("tp.s/sched/sched_switch")
+int handle_tp_non_syscall(void *ctx)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
new file mode 100644
index 000000000000..1a0748a9520b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sleepable_tracepoints_fail.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Sleepable program on a non-faultable tracepoint should fail to load */
+SEC("tp_btf.s/sched_switch")
+__failure __msg("Sleepable program cannot attach to non-faultable tracepoint")
+int BPF_PROG(handle_sched_switch, bool preempt,
+	     struct task_struct *prev, struct task_struct *next)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/sleepable.c b/tools/testing/selftests/bpf/verifier/sleepable.c
index c2b7f5ebf168..6dabc5522945 100644
--- a/tools/testing/selftests/bpf/verifier/sleepable.c
+++ b/tools/testing/selftests/bpf/verifier/sleepable.c
@@ -76,7 +76,20 @@
 	.runs = -1,
 },
 {
-	"sleepable raw tracepoint reject",
+	"sleepable raw tracepoint accept",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_TRACING,
+	.expected_attach_type = BPF_TRACE_RAW_TP,
+	.kfunc = "sys_enter",
+	.result = ACCEPT,
+	.flags = BPF_F_SLEEPABLE,
+	.runs = -1,
+},
+{
+	"sleepable raw tracepoint reject non-faultable",
 	.insns = {
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
@@ -85,7 +98,7 @@
 	.expected_attach_type = BPF_TRACE_RAW_TP,
 	.kfunc = "sched_switch",
 	.result = REJECT,
-	.errstr = "Only fentry/fexit/fsession/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable",
+	.errstr = "Sleepable program cannot attach to non-faultable tracepoint",
 	.flags = BPF_F_SLEEPABLE,
 	.runs = -1,
 },

-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-22 19:41 ` [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
@ 2026-04-22 20:30   ` bot+bpf-ci
  2026-04-22 20:37     ` Mykyta Yatsenko
  0 siblings, 1 reply; 13+ messages in thread
From: bot+bpf-ci @ 2026-04-22 20:30 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, peterz, rostedt
  Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
	clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c

[ ... ]

> @@ -0,0 +1,142 @@
> +void test_sleepable_tracepoints(void)
> +{
> +	struct test_sleepable_tracepoints *skel;
> +
> +	skel = test_sleepable_tracepoints__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
> +		return;
> +
> +	if (test__start_subtest("tp_btf"))
> +		run_auto_attach_test(skel->progs.handle_sys_enter_tp_btf, skel);
> +	if (test__start_subtest("raw_tp"))
> +		run_auto_attach_test(skel->progs.handle_sys_enter_raw_tp, skel);
> +	if (test__start_subtest("tracepoint"))
> +		run_auto_attach_test(skel->progs.handle_sys_enter_tp, skel);
> +	if (test__start_subtest("sys_exit"))
> +		run_auto_attach_test(skel->progs.handle_sys_exit_tp, skel);
> +	if (test__start_subtest("tracepoint_alias"))
> +		test_attach_only(skel->progs.handle_sys_enter_tp_alias);
> +	if (test__start_subtest("raw_tracepoint_alias"))
> +		test_attach_only(skel->progs.handle_sys_enter_raw_tp_alias);
> +	if (test__start_subtest("raw_tp_bare"))
> +		test_raw_tp_bare(skel);
> +	if (test__start_subtest("tp_bare"))
> +		test_tp_bare(skel);
> +	if (test__start_subtest("test_run"))
> +		test_test_run(skel);
> +	if (test__start_subtest("test_run_on_cpu_reject"))
> +		test_test_run_on_cpu_reject(skel);
> +	if (test__start_subtest("raw_tp_non_faultable"))
> +		test_attach_reject(skel->progs.handle_raw_tp_non_faultable);
> +	if (test__start_subtest("tp_non_syscall"))
> +		test_attach_reject(skel->progs.handle_tp_non_syscall);
> +	if (test__start_subtest("tp_btf_non_faultable_reject"))
> +		RUN_TESTS(test_sleepable_tracepoints_fail);
> +
> +	test_sleepable_tracepoints__destroy(skel);
> +}

In v8, Andrii Nakryiko suggested simplifying this test structure by using
an array-based approach instead of individual if statements for each test
case:

  "why not have an array

  struct { const char *name; struct bpf_program *prog } cases[] = {
     { "tp_btf", skel->progs.handle_sys_enter_tp_btf},
      ...
  }

  and then do the tests in a loop?"

Reference: https://lore.kernel.org/bpf/CAEf4BzZ_YOb1MNpUxuhFESqRWARFiBJ6yGZ-Z8xBqdCvqi_Bow@mail.gmail.com/

Would an array-driven approach reduce repetition and make it easier to
maintain as new test cases are added?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24799223877

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-22 20:30   ` bot+bpf-ci
@ 2026-04-22 20:37     ` Mykyta Yatsenko
  2026-04-22 20:49       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 13+ messages in thread
From: Mykyta Yatsenko @ 2026-04-22 20:37 UTC (permalink / raw)
  To: bot+bpf-ci, bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87,
	memxor, peterz, rostedt
  Cc: yatsenko, martin.lau, yonghong.song, clm, ihor.solodrai



On 4/22/26 9:30 PM, bot+bpf-ci@kernel.org wrote:
>> diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
> 
> [ ... ]
> 
>> @@ -0,0 +1,142 @@
>> +void test_sleepable_tracepoints(void)
>> +{
>> +	struct test_sleepable_tracepoints *skel;
>> +
>> +	skel = test_sleepable_tracepoints__open_and_load();
>> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
>> +		return;
>> +
>> +	if (test__start_subtest("tp_btf"))
>> +		run_auto_attach_test(skel->progs.handle_sys_enter_tp_btf, skel);
>> +	if (test__start_subtest("raw_tp"))
>> +		run_auto_attach_test(skel->progs.handle_sys_enter_raw_tp, skel);
>> +	if (test__start_subtest("tracepoint"))
>> +		run_auto_attach_test(skel->progs.handle_sys_enter_tp, skel);
>> +	if (test__start_subtest("sys_exit"))
>> +		run_auto_attach_test(skel->progs.handle_sys_exit_tp, skel);
>> +	if (test__start_subtest("tracepoint_alias"))
>> +		test_attach_only(skel->progs.handle_sys_enter_tp_alias);
>> +	if (test__start_subtest("raw_tracepoint_alias"))
>> +		test_attach_only(skel->progs.handle_sys_enter_raw_tp_alias);
>> +	if (test__start_subtest("raw_tp_bare"))
>> +		test_raw_tp_bare(skel);
>> +	if (test__start_subtest("tp_bare"))
>> +		test_tp_bare(skel);
>> +	if (test__start_subtest("test_run"))
>> +		test_test_run(skel);
>> +	if (test__start_subtest("test_run_on_cpu_reject"))
>> +		test_test_run_on_cpu_reject(skel);
>> +	if (test__start_subtest("raw_tp_non_faultable"))
>> +		test_attach_reject(skel->progs.handle_raw_tp_non_faultable);
>> +	if (test__start_subtest("tp_non_syscall"))
>> +		test_attach_reject(skel->progs.handle_tp_non_syscall);
>> +	if (test__start_subtest("tp_btf_non_faultable_reject"))
>> +		RUN_TESTS(test_sleepable_tracepoints_fail);
>> +
>> +	test_sleepable_tracepoints__destroy(skel);
>> +}
> 
> In v8, Andrii Nakryiko suggested simplifying this test structure by using
> an array-based approach instead of individual if statements for each test
> case:
> 
>    "why not have an array
> 
>    struct { const char *name; struct bpf_program *prog } cases[] = {
>       { "tp_btf", skel->progs.handle_sys_enter_tp_btf},
>        ...
>    }
> 
>    and then do the tests in a loop?"
> 
> Reference: https://lore.kernel.org/bpf/CAEf4BzZ_YOb1MNpUxuhFESqRWARFiBJ6yGZ-Z8xBqdCvqi_Bow@mail.gmail.com/
> 
> Would an array-driven approach reduce repetition and make it easier to
> maintain as new test cases are added?
> 
I did array-driven approach around v10, but then after few de-claudings 
and simplifications ended up with this structure that is more common in 
our selftests. This implementation reuses test logic via generic 
functions, like test_attach_only() or run_auto_attach_test().
I find it's even better this way.

> [ ... ]
> 
> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24799223877


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs
  2026-04-22 20:37     ` Mykyta Yatsenko
@ 2026-04-22 20:49       ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 13+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2026-04-22 20:49 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bot+bpf-ci, bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87,
	peterz, rostedt, yatsenko, martin.lau, yonghong.song, clm,
	ihor.solodrai

On Wed, 22 Apr 2026 at 22:37, Mykyta Yatsenko
<mykyta.yatsenko5@gmail.com> wrote:
>
>
>
> On 4/22/26 9:30 PM, bot+bpf-ci@kernel.org wrote:
> >> diff --git a/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
> >> --- /dev/null
> >> +++ b/tools/testing/selftests/bpf/prog_tests/sleepable_tracepoints.c
> >
> > [ ... ]
> >
> >> @@ -0,0 +1,142 @@
> >> +void test_sleepable_tracepoints(void)
> >> +{
> >> +    struct test_sleepable_tracepoints *skel;
> >> +
> >> +    skel = test_sleepable_tracepoints__open_and_load();
> >> +    if (!ASSERT_OK_PTR(skel, "open_and_load"))
> >> +            return;
> >> +
> >> +    if (test__start_subtest("tp_btf"))
> >> +            run_auto_attach_test(skel->progs.handle_sys_enter_tp_btf, skel);
> >> +    if (test__start_subtest("raw_tp"))
> >> +            run_auto_attach_test(skel->progs.handle_sys_enter_raw_tp, skel);
> >> +    if (test__start_subtest("tracepoint"))
> >> +            run_auto_attach_test(skel->progs.handle_sys_enter_tp, skel);
> >> +    if (test__start_subtest("sys_exit"))
> >> +            run_auto_attach_test(skel->progs.handle_sys_exit_tp, skel);
> >> +    if (test__start_subtest("tracepoint_alias"))
> >> +            test_attach_only(skel->progs.handle_sys_enter_tp_alias);
> >> +    if (test__start_subtest("raw_tracepoint_alias"))
> >> +            test_attach_only(skel->progs.handle_sys_enter_raw_tp_alias);
> >> +    if (test__start_subtest("raw_tp_bare"))
> >> +            test_raw_tp_bare(skel);
> >> +    if (test__start_subtest("tp_bare"))
> >> +            test_tp_bare(skel);
> >> +    if (test__start_subtest("test_run"))
> >> +            test_test_run(skel);
> >> +    if (test__start_subtest("test_run_on_cpu_reject"))
> >> +            test_test_run_on_cpu_reject(skel);
> >> +    if (test__start_subtest("raw_tp_non_faultable"))
> >> +            test_attach_reject(skel->progs.handle_raw_tp_non_faultable);
> >> +    if (test__start_subtest("tp_non_syscall"))
> >> +            test_attach_reject(skel->progs.handle_tp_non_syscall);
> >> +    if (test__start_subtest("tp_btf_non_faultable_reject"))
> >> +            RUN_TESTS(test_sleepable_tracepoints_fail);
> >> +
> >> +    test_sleepable_tracepoints__destroy(skel);
> >> +}
> >
> > In v8, Andrii Nakryiko suggested simplifying this test structure by using
> > an array-based approach instead of individual if statements for each test
> > case:
> >
> >    "why not have an array
> >
> >    struct { const char *name; struct bpf_program *prog } cases[] = {
> >       { "tp_btf", skel->progs.handle_sys_enter_tp_btf},
> >        ...
> >    }
> >
> >    and then do the tests in a loop?"
> >
> > Reference: https://lore.kernel.org/bpf/CAEf4BzZ_YOb1MNpUxuhFESqRWARFiBJ6yGZ-Z8xBqdCvqi_Bow@mail.gmail.com/
> >
> > Would an array-driven approach reduce repetition and make it easier to
> > maintain as new test cases are added?
> >
> I did array-driven approach around v10, but then after few de-claudings
> and simplifications ended up with this structure that is more common in
> our selftests. This implementation reuses test logic via generic
> functions, like test_attach_only() or run_auto_attach_test().
> I find it's even better this way.
>

Yeah, I think it's fine this way; it's a matter of preference.

> > [ ... ]
> >
> >
> > ---
> > AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> >
> > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24799223877
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-22 19:41 ` [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
@ 2026-04-22 20:57   ` bot+bpf-ci
  2026-04-22 23:35   ` sashiko-bot
  1 sibling, 0 replies; 13+ messages in thread
From: bot+bpf-ci @ 2026-04-22 20:57 UTC (permalink / raw)
  To: mykyta.yatsenko5, bpf, ast, andrii, daniel, kafai, kernel-team,
	eddyz87, memxor, peterz, rostedt
  Cc: yatsenko, ast, andrii, daniel, martin.lau, eddyz87, yonghong.song,
	clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2981 bytes --]

> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 8ad72e17d8eb..e98ee7e1e66f 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c

[ ... ]

> @@ -1410,12 +1410,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
>  	int size = 0;
>  	int uargs = 0;
>
> -	/*
> -	 * Syscall probe called with preemption enabled, but the ring
> -	 * buffer and per-cpu data require preemption to be disabled.
> -	 */
>  	might_fault();
> -	guard(preempt_notrace)();
>
>  	syscall_nr = trace_get_syscall_nr(current, regs);
>  	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
> @@ -1429,6 +1424,26 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
>
>  	syscall_get_arguments(current, regs, args);
>
> +	/*
> +	 * Run BPF program in faultable context before per-cpu buffer
> +	 * allocation, allowing sleepable BPF programs to execute.
> +	 */
> +	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
> +	if (valid_prog_array &&
> +	    !perf_call_bpf_enter(sys_data->enter_event, sys_data,
> +				 syscall_nr, args))
> +		return;
> +
> +	/*
> +	 * Per-cpu ring buffer and perf event list operations require
> +	 * preemption to be disabled.
> +	 */
> +	guard(preempt_notrace)();
> +
> +	head = this_cpu_ptr(sys_data->enter_event->perf_events);
> +	if (hlist_empty(head))
> +		return;
> +

Could this lead to unnecessary buffer allocation in a race scenario?

The patch moves BPF execution before preempt_disable to enable sleepable
programs. The new code checks valid_prog_array outside preempt_disable
and runs BPF if true. Then it enters preempt_disable and checks if the
hlist is empty.

Between the BPF check (outside preempt_disable) and the hlist check
(inside preempt_disable), another CPU could unregister the last perf
event, making the hlist empty. In this window, the code would:

1. Run BPF and pass the filter (before preempt_disable)
2. Check valid_prog_array and find it true (before preempt_disable)
3. Enter preempt_disable via guard(preempt_notrace)()
4. Check hlist_empty(head) and find it false (inside preempt_disable)
5. Allocate buffer via perf_trace_buf_alloc()
6. Submit to an empty or nearly-empty hlist

The old code checked both conditions together under preempt_disable
before buffer allocation:

    head = this_cpu_ptr(sys_data->enter_event->perf_events);
    valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
    if (!valid_prog_array && hlist_empty(head))
        return;  /* Early exit before buffer allocation */

If the perf event is unregistered between the new code's BPF execution
and the hlist check, would the buffer allocation and data copying be
wasted work?

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24799223877

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs
  2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
                   ` (5 preceding siblings ...)
  2026-04-22 19:41 ` [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
@ 2026-04-22 21:00 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-22 21:00 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, ast, andrii, daniel, kafai, kernel-team, eddyz87, memxor,
	peterz, rostedt, yatsenko

Hello:

This series was applied to bpf/bpf-next.git (master)
by Kumar Kartikeya Dwivedi <memxor@gmail.com>:

On Wed, 22 Apr 2026 12:41:05 -0700 you wrote:
> This series adds support for sleepable BPF programs attached to raw
> tracepoints (tp_btf, raw_tp) and classic tracepoints (tp).
> The motivation is to allow BPF programs on syscall
> tracepoints to use sleepable helpers such as bpf_copy_from_user(),
> enabling reliable user memory reads that can page-fault.
> 
> This series removes restriction for faultable tracepoints:
> 
> [...]

Here is the summary with links:
  - [bpf-next,v13,1/6] bpf: Add sleepable support for raw tracepoint programs
    https://git.kernel.org/bpf/bpf-next/c/439ebd5b5708
  - [bpf-next,v13,2/6] bpf: Add bpf_prog_run_array_sleepable()
    https://git.kernel.org/bpf/bpf-next/c/12628ffaf98b
  - [bpf-next,v13,3/6] bpf: Add sleepable support for classic tracepoint programs
    https://git.kernel.org/bpf/bpf-next/c/57918341dd19
  - [bpf-next,v13,4/6] bpf: Verifier support for sleepable tracepoint programs
    https://git.kernel.org/bpf/bpf-next/c/8cfb77d30920
  - [bpf-next,v13,5/6] libbpf: Add section handlers for sleepable tracepoints
    https://git.kernel.org/bpf/bpf-next/c/0cd420a6f40c
  - [bpf-next,v13,6/6] selftests/bpf: Add tests for sleepable tracepoint programs
    https://git.kernel.org/bpf/bpf-next/c/8a20655749c6

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs
  2026-04-22 19:41 ` [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
  2026-04-22 20:57   ` bot+bpf-ci
@ 2026-04-22 23:35   ` sashiko-bot
  1 sibling, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-04-22 23:35 UTC (permalink / raw)
  To: Mykyta Yatsenko; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [High] Uninitialized kernel stack memory leaked to BPF programs via `param` structure.
- [High] Moving BPF program execution before `syscall_get_data()` breaks `bpf_probe_read_user()` for non-sleepable programs by removing implicit page faulting.
--

commit 6d42174eb2c9a4cc877bdf6b63e5b37b8017a109
Author: Mykyta Yatsenko <yatsenko@meta.com>

bpf: Add sleepable support for classic tracepoint programs

This commit adds support for sleepable BPF programs on syscall tracepoints
by moving the BPF program execution out of the non-preemptible ring buffer
allocation path. It introduces trace_call_bpf_faultable() and restructures
perf_syscall_enter() and perf_syscall_exit().

> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 8ad72e17d8eb9..e98ee7e1e66f4 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -1371,33 +1371,33 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
>  static int sys_perf_refcount_enter;
>  static int sys_perf_refcount_exit;
>  
> -static int perf_call_bpf_enter(struct trace_event_call *call, struct pt_regs *regs,
> +static int perf_call_bpf_enter(struct trace_event_call *call,
>  			       struct syscall_metadata *sys_data,
> -			       struct syscall_trace_enter *rec)
> +			       int syscall_nr, unsigned long *args)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long args[SYSCALL_DEFINE_MAXARGS];
>  	} __aligned(8) param;
> +	struct pt_regs regs = {};

Could param be zero-initialized here? 

Because param is allocated on the kernel stack without zero-initialization,
there is a 4-byte padding between syscall_nr and args. Additionally, only
sys_data->nb_args elements of the args array are initialized, leaving the
remaining elements uninitialized.

Since the BPF verifier allows tracing programs to read the full context size,
this padding and uninitialized array elements can be read by BPF programs,
leaking kernel stack memory. Zero-initializing param (e.g., param = {})
would prevent this.

[ ... ]

> @@ -1429,6 +1424,26 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
>  
>  	syscall_get_arguments(current, regs, args);
>  
> +	/*
> +	 * Run BPF program in faultable context before per-cpu buffer
> +	 * allocation, allowing sleepable BPF programs to execute.
> +	 */
> +	valid_prog_array = bpf_prog_array_valid(sys_data->enter_event);
> +	if (valid_prog_array &&
> +	    !perf_call_bpf_enter(sys_data->enter_event, sys_data,
> +				 syscall_nr, args))
> +		return;
> +

Does moving BPF program execution here break bpf_probe_read_user() for
existing non-sleepable BPF programs?

Previously, BPF programs ran after syscall_get_data() was called, which
explicitly faulted in user memory for string arguments. Because the memory
became resident, non-sleepable BPF programs could use bpf_probe_read_user()
to reliably read those arguments without taking page faults.

Now that the BPF program runs before syscall_get_data(), if the user memory
is paged out, bpf_probe_read_user() in non-sleepable programs might fail
with -EFAULT. 

[ ... ]

> -static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_regs *regs,
> -			      struct syscall_trace_exit *rec)
> +static int perf_call_bpf_exit(struct trace_event_call *call,
> +			      int syscall_nr, long ret_val)
>  {
>  	struct syscall_tp_t {
>  		struct trace_entry ent;
>  		int syscall_nr;
>  		unsigned long ret;
>  	} __aligned(8) param;
> -
> -	/* bpf prog requires 'regs' to be the first member in the ctx (a.k.a. &param) */
> -	perf_fetch_caller_regs(regs);
> -	*(struct pt_regs **)&param = regs;
> -	param.syscall_nr = rec->nr;
> -	param.ret = rec->ret;
> -	return trace_call_bpf(call, &param);
> +	struct pt_regs regs = {};

Similarly to perf_call_bpf_enter(), could param be zero-initialized here
to avoid leaking kernel stack memory through the padding between syscall_nr
and ret?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260422-sleepable_tracepoints-v13-0-99005dff21ef@meta.com?part=3

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-22 23:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22 19:41 [PATCH bpf-next v13 0/6] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
2026-04-22 19:41 ` [PATCH bpf-next v13 1/6] bpf: Add sleepable support for raw " Mykyta Yatsenko
2026-04-22 19:41 ` [PATCH bpf-next v13 2/6] bpf: Add bpf_prog_run_array_sleepable() Mykyta Yatsenko
2026-04-22 19:41 ` [PATCH bpf-next v13 3/6] bpf: Add sleepable support for classic tracepoint programs Mykyta Yatsenko
2026-04-22 20:57   ` bot+bpf-ci
2026-04-22 23:35   ` sashiko-bot
2026-04-22 19:41 ` [PATCH bpf-next v13 4/6] bpf: Verifier support for sleepable " Mykyta Yatsenko
2026-04-22 19:41 ` [PATCH bpf-next v13 5/6] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
2026-04-22 19:41 ` [PATCH bpf-next v13 6/6] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
2026-04-22 20:30   ` bot+bpf-ci
2026-04-22 20:37     ` Mykyta Yatsenko
2026-04-22 20:49       ` Kumar Kartikeya Dwivedi
2026-04-22 21:00 ` [PATCH bpf-next v13 0/6] bpf: Add support " patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox