From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com,
eddyz87@gmail.com, Mykyta Yatsenko <yatsenko@meta.com>
Subject: Re: [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic tracepoint programs
Date: Mon, 23 Mar 2026 20:57:40 +0000 [thread overview]
Message-ID: <87a4vyjmbf.fsf@gmail.com> (raw)
In-Reply-To: <CAP01T75bzeb867EtX52ua2myYOxH15tkXd9P_7=7ACkSdS8wQA@mail.gmail.com>
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> On Mon, 16 Mar 2026 at 22:47, Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
>> faultable tracepoints that supports sleepable BPF programs. It uses
>> rcu_tasks_trace for lifetime protection and bpf_prog_run_array_uprobe()
>> for per-program RCU flavor selection, following the uprobe_prog_run()
>> pattern. Uses preempt-safe this_cpu_inc_return/this_cpu_dec for the
>> bpf_prog_active recursion counter since preemption is enabled in this
>> context.
>>
>> Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
>> filter before perf event processing. Previously, BPF ran after the
>> per-cpu perf trace buffer was allocated under preempt_disable,
>> requiring cleanup via perf_swevent_put_recursion_context() on filter.
>> Now BPF runs in faultable context before preempt_disable, reading
>> syscall arguments from local variables instead of the per-cpu trace
>> record, removing the dependency on buffer allocation. This allows
>> sleepable BPF programs to execute and avoids unnecessary buffer
>> allocation when BPF filters the event. The perf event submission
>> path (buffer allocation, fill, submit) remains under preempt_disable
>> as before.
>>
>> Add an attach-time check in __perf_event_set_bpf_prog() to reject
>> sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
>> tracepoints, since only syscall tracepoints run in faultable context.
>>
>> This prepares the classic tracepoint runtime and attach paths for
>> sleepable programs. The verifier changes to allow loading sleepable
>> BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.
>>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>> include/linux/trace_events.h | 6 +++
>> kernel/events/core.c | 9 ++++
>> kernel/trace/bpf_trace.c | 39 ++++++++++++++++
>> kernel/trace/trace_syscalls.c | 104 +++++++++++++++++++++++-------------------
>> 4 files changed, 110 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 37eb2f0f3dd8..5fbbeb9ec4b9 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -767,6 +767,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
>>
>> #ifdef CONFIG_BPF_EVENTS
>> unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
>> +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
>> int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>> void perf_event_detach_bpf_prog(struct perf_event *event);
>> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>> @@ -789,6 +790,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
>> return 1;
>> }
>>
>> +static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
>> +{
>> + return 1;
>> +}
>> +
>> static inline int
>> perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
>> {
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 1f5699b339ec..46b733d3dd41 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -11647,6 +11647,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
>> /* only uprobe programs are allowed to be sleepable */
>> return -EINVAL;
>>
>> + if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
>> + /*
>> + * Sleepable tracepoint programs can only attach to faultable
>> + * tracepoints. Currently only syscall tracepoints are faultable.
>> + */
>> + if (!is_syscall_tp)
>> + return -EINVAL;
>> + }
>> +
>> /* Kprobe override only works for kprobes, not uprobes. */
>> if (prog->kprobe_override && !is_kprobe)
>> return -EINVAL;
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 35ed53807cfd..69c9a5539e65 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -152,6 +152,45 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>> return ret;
>> }
>>
>> +/**
>> + * trace_call_bpf_faultable - invoke BPF program in faultable context
>> + * @call: tracepoint event
>> + * @ctx: opaque context pointer
>> + *
>> + * Variant of trace_call_bpf() for faultable tracepoints (e.g. syscall
>> + * tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
>> + * for lifetime protection and per-program rcu_read_lock for non-sleepable
>> + * programs, following the uprobe_prog_run() pattern.
>> + *
>> + * Must be called from a faultable/preemptible context.
>> + */
>> +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
>> +{
>> + struct bpf_prog_array *prog_array;
>> + unsigned int ret;
>> +
>> + might_fault();
>> +
>> + guard(rcu_tasks_trace)();
>> + guard(migrate)();
>> +
>> + if (unlikely(this_cpu_inc_return(bpf_prog_active) != 1)) {
>
> This seems a bit heavy handed, why not
> bpf_prog_get_recursion_context()? Esp. since we can potentially sleep
> and other tasks can preempt us on the same CPU.
Agree, the same point was raised by sashiko review. Thanks, though, for
the hint to look into bpf_prog_get_recursion_context(), I was not
aware of it.
>
>> + scoped_guard(rcu) {
>> + bpf_prog_inc_misses_counters(rcu_dereference(call->prog_array));
>> + }
>> + this_cpu_dec(bpf_prog_active);
>> + return 0;
>> + }
>> +
>> + prog_array = rcu_dereference_check(call->prog_array,
>> + rcu_read_lock_trace_held());
>> + ret = bpf_prog_run_array_uprobe(prog_array, ctx, bpf_prog_run);
>
> This confused me a bit, do we need to generalize it a bit? E.g. it
> would now set run_ctx.is_uprobe = 1 for this case too.
> Actually, except that bit, the rest looks reusable, so maybe it should
> be renamed to __ version and expose two wrappers, one that passes
> is_uprobe = false and one passing true?
> Then do bpf_prog_run_array_trace() here?
Thanks, I agree on your point.
>
>> +
>> [...]
next prev parent reply other threads:[~2026-03-23 20:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 21:46 [PATCH bpf-next v5 0/5] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 1/5] bpf: Add sleepable support for raw " Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic " Mykyta Yatsenko
2026-03-16 22:22 ` bot+bpf-ci
2026-03-23 20:38 ` Kumar Kartikeya Dwivedi
2026-03-23 20:57 ` Mykyta Yatsenko [this message]
2026-03-16 21:46 ` [PATCH bpf-next v5 3/5] bpf: Verifier support for sleepable " Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 4/5] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
-- strict thread matches above, loose matches on Subject: below --
2026-03-23 21:17 [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic " oskar
2026-03-23 21:26 ` oskar
[not found] ` <e0443380-97d3-4bcc-b599-0883bb6c6a03@gmail.com>
2026-03-24 14:32 ` Mykyta Yatsenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4vyjmbf.fsf@gmail.com \
--to=mykyta.yatsenko5@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kafai@meta.com \
--cc=kernel-team@meta.com \
--cc=memxor@gmail.com \
--cc=yatsenko@meta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox