public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net, kafai@meta.com, kernel-team@meta.com,
	eddyz87@gmail.com, Mykyta Yatsenko <yatsenko@meta.com>
Subject: Re: [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic tracepoint programs
Date: Mon, 23 Mar 2026 20:57:40 +0000	[thread overview]
Message-ID: <87a4vyjmbf.fsf@gmail.com> (raw)
In-Reply-To: <CAP01T75bzeb867EtX52ua2myYOxH15tkXd9P_7=7ACkSdS8wQA@mail.gmail.com>

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Mon, 16 Mar 2026 at 22:47, Mykyta Yatsenko
> <mykyta.yatsenko5@gmail.com> wrote:
>>
>> From: Mykyta Yatsenko <yatsenko@meta.com>
>>
>> Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for
>> faultable tracepoints that supports sleepable BPF programs. It uses
>> rcu_tasks_trace for lifetime protection and bpf_prog_run_array_uprobe()
>> for per-program RCU flavor selection, following the uprobe_prog_run()
>> pattern. Uses preempt-safe this_cpu_inc_return/this_cpu_dec for the
>> bpf_prog_active recursion counter since preemption is enabled in this
>> context.
>>
>> Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF
>> filter before perf event processing. Previously, BPF ran after the
>> per-cpu perf trace buffer was allocated under preempt_disable,
>> requiring cleanup via perf_swevent_put_recursion_context() on filter.
>> Now BPF runs in faultable context before preempt_disable, reading
>> syscall arguments from local variables instead of the per-cpu trace
>> record, removing the dependency on buffer allocation. This allows
>> sleepable BPF programs to execute and avoids unnecessary buffer
>> allocation when BPF filters the event. The perf event submission
>> path (buffer allocation, fill, submit) remains under preempt_disable
>> as before.
>>
>> Add an attach-time check in __perf_event_set_bpf_prog() to reject
>> sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall
>> tracepoints, since only syscall tracepoints run in faultable context.
>>
>> This prepares the classic tracepoint runtime and attach paths for
>> sleepable programs. The verifier changes to allow loading sleepable
>> BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch.
>>
>> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
>> ---
>>  include/linux/trace_events.h  |   6 +++
>>  kernel/events/core.c          |   9 ++++
>>  kernel/trace/bpf_trace.c      |  39 ++++++++++++++++
>>  kernel/trace/trace_syscalls.c | 104 +++++++++++++++++++++++-------------------
>>  4 files changed, 110 insertions(+), 48 deletions(-)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 37eb2f0f3dd8..5fbbeb9ec4b9 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -767,6 +767,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
>>
>>  #ifdef CONFIG_BPF_EVENTS
>>  unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
>> +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
>>  int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>>  void perf_event_detach_bpf_prog(struct perf_event *event);
>>  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>> @@ -789,6 +790,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
>>         return 1;
>>  }
>>
>> +static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
>> +{
>> +       return 1;
>> +}
>> +
>>  static inline int
>>  perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
>>  {
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 1f5699b339ec..46b733d3dd41 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -11647,6 +11647,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
>>                 /* only uprobe programs are allowed to be sleepable */
>>                 return -EINVAL;
>>
>> +       if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
>> +               /*
>> +                * Sleepable tracepoint programs can only attach to faultable
>> +                * tracepoints. Currently only syscall tracepoints are faultable.
>> +                */
>> +               if (!is_syscall_tp)
>> +                       return -EINVAL;
>> +       }
>> +
>>         /* Kprobe override only works for kprobes, not uprobes. */
>>         if (prog->kprobe_override && !is_kprobe)
>>                 return -EINVAL;
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 35ed53807cfd..69c9a5539e65 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -152,6 +152,45 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>>         return ret;
>>  }
>>
>> +/**
>> + * trace_call_bpf_faultable - invoke BPF program in faultable context
>> + * @call: tracepoint event
>> + * @ctx: opaque context pointer
>> + *
>> + * Variant of trace_call_bpf() for faultable tracepoints (e.g. syscall
>> + * tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
>> + * for lifetime protection and per-program rcu_read_lock for non-sleepable
>> + * programs, following the uprobe_prog_run() pattern.
>> + *
>> + * Must be called from a faultable/preemptible context.
>> + */
>> +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
>> +{
>> +       struct bpf_prog_array *prog_array;
>> +       unsigned int ret;
>> +
>> +       might_fault();
>> +
>> +       guard(rcu_tasks_trace)();
>> +       guard(migrate)();
>> +
>> +       if (unlikely(this_cpu_inc_return(bpf_prog_active) != 1)) {
>
> This seems a bit heavy handed, why not
> bpf_prog_get_recursion_context()? Esp. since we can potentially sleep
> and other tasks can preempt us on the same CPU.
Agree, the same point was raised by sashiko review. Thanks, though, for
the hint to look into bpf_prog_get_recursion_context(), I was not
aware of it.
>
>> +               scoped_guard(rcu) {
>> +                       bpf_prog_inc_misses_counters(rcu_dereference(call->prog_array));
>> +               }
>> +               this_cpu_dec(bpf_prog_active);
>> +               return 0;
>> +       }
>> +
>> +       prog_array = rcu_dereference_check(call->prog_array,
>> +                                          rcu_read_lock_trace_held());
>> +       ret = bpf_prog_run_array_uprobe(prog_array, ctx, bpf_prog_run);
>
> This confused me a bit, do we need to generalize it a bit? E.g. it
> would now set run_ctx.is_uprobe = 1 for this case too.
> Actually, except that bit, the rest looks reusable, so maybe it should
> be renamed to __ version and expose two wrappers, one that passes
> is_uprobe = false and one passing true?
> Then do bpf_prog_run_array_trace() here?
Thanks, I agree on your point.
>
>> +
>> [...]

  reply	other threads:[~2026-03-23 20:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 21:46 [PATCH bpf-next v5 0/5] bpf: Add support for sleepable tracepoint programs Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 1/5] bpf: Add sleepable support for raw " Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic " Mykyta Yatsenko
2026-03-16 22:22   ` bot+bpf-ci
2026-03-23 20:38   ` Kumar Kartikeya Dwivedi
2026-03-23 20:57     ` Mykyta Yatsenko [this message]
2026-03-16 21:46 ` [PATCH bpf-next v5 3/5] bpf: Verifier support for sleepable " Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 4/5] libbpf: Add section handlers for sleepable tracepoints Mykyta Yatsenko
2026-03-16 21:46 ` [PATCH bpf-next v5 5/5] selftests/bpf: Add tests for sleepable tracepoint programs Mykyta Yatsenko
  -- strict thread matches above, loose matches on Subject: below --
2026-03-23 21:17 [PATCH bpf-next v5 2/5] bpf: Add sleepable support for classic " oskar
2026-03-23 21:26 ` oskar
     [not found]   ` <e0443380-97d3-4bcc-b599-0883bb6c6a03@gmail.com>
2026-03-24 14:32     ` Mykyta Yatsenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a4vyjmbf.fsf@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=kafai@meta.com \
    --cc=kernel-team@meta.com \
    --cc=memxor@gmail.com \
    --cc=yatsenko@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox