* [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints @ 2026-03-30 3:21 Kumar Kartikeya Dwivedi 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi 2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi 0 siblings, 2 replies; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 3:21 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team A recent change to non-faultable tracepoints switched from preempt-disabled critical sections to SRCU-fast, which breaks assumptions in the bpf_link_free() path. Use call_srcu() to fix the breakage. While at it, retire rcu_trace_implies_rcu_gp() and clean up all in-tree users. Kumar Kartikeya Dwivedi (2): bpf: Fix grace period wait for tracepoint bpf_link bpf: Retire rcu_trace_implies_rcu_gp() include/linux/rcupdate.h | 12 ------------ include/linux/tracepoint.h | 8 ++++++++ kernel/bpf/core.c | 10 ++++------ kernel/bpf/memalloc.c | 33 ++++++++++--------------------- kernel/bpf/syscall.c | 40 ++++++++++++++++++++++---------------- 5 files changed, 45 insertions(+), 58 deletions(-) base-commit: c369299895a591d96745d6492d4888259b004a9e -- 2.52.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 3:21 [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints Kumar Kartikeya Dwivedi @ 2026-03-30 3:21 ` Kumar Kartikeya Dwivedi 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi ` (2 more replies) 2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi 1 sibling, 3 replies; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 3:21 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team Recently, tracepoints were switched from using disabled preemption (which acts as RCU read section) to SRCU-fast when they are not faultable. This means that to do a proper grace period wait for programs running in such tracepoints, we must use SRCU's grace period wait. This is only for non-faultable tracepoints, faultable ones continue using RCU Tasks Trace. However, bpf_link_free() currently does call_rcu() for all cases when the link is non-sleepable (hence, for tracepoints, non-faultable). Fix this by doing a call_srcu() grace period wait. As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed unnecessary for tracepoint programs. The link and program are either accessed under RCU Tasks Trace protection, or SRCU-fast protection now. The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was to generalize the logic, even if it conceded an extra RCU gp wait, however that is unnecessary for tracepoints even before this change. In practice no cost was paid since rcu_trace_implies_rcu_gp() was always true. Hence we need not chain any SRCU gp waits after RCU Tasks Trace. Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> --- include/linux/tracepoint.h | 8 ++++++++ kernel/bpf/syscall.c | 22 ++++++++++++++++++++-- 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 22ca1c8b54f3..8227102a771f 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -113,6 +113,10 @@ void for_each_tracepoint_in_module(struct module *mod, */ #ifdef CONFIG_TRACEPOINTS extern struct srcu_struct tracepoint_srcu; +static inline struct srcu_struct *tracepoint_srcu_ptr(void) +{ + return &tracepoint_srcu; +} static inline void tracepoint_synchronize_unregister(void) { synchronize_rcu_tasks_trace(); @@ -123,6 +127,10 @@ static inline bool tracepoint_is_faultable(struct tracepoint *tp) return tp->ext && tp->ext->faultable; } #else +static inline struct srcu_struct *tracepoint_srcu_ptr(void) +{ + return NULL; +} static inline void tracepoint_synchronize_unregister(void) { } static inline bool tracepoint_is_faultable(struct tracepoint *tp) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 274039e36465..ab61a5ce35af 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3261,6 +3261,13 @@ static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) bpf_link_dealloc(link); } +static bool bpf_link_is_tracepoint(struct bpf_link *link) +{ + /* Only these combinations support a tracepoint bpf_link. */ + return link->type == BPF_LINK_TYPE_RAW_TRACEPOINT || + (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); +} + static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) { if (rcu_trace_implies_rcu_gp()) @@ -3279,16 +3286,27 @@ static void bpf_link_free(struct bpf_link *link) if (link->prog) ops->release(link); if (ops->dealloc_deferred) { - /* Schedule BPF link deallocation, which will only then + struct srcu_struct *tp_srcu = tracepoint_srcu_ptr(); + + /* + * Schedule BPF link deallocation, which will only then * trigger putting BPF program refcount. * If underlying BPF program is sleepable or BPF link's target * attach hookpoint is sleepable or otherwise requires RCU GPs * to ensure link and its underlying BPF program is not * reachable anymore, we need to first wait for RCU tasks - * trace sync, and then go through "classic" RCU grace period + * trace sync, and then go through "classic" RCU grace period. + * + * For tracepoint BPF links, we need to go through SRCU grace + * period wait instead when non-faultable tracepoint is used. We + * don't need to chain SRCU grace period waits, however, for the + * faultable case, since it exclusively uses RCU Tasks Trace. */ if (link->sleepable || (link->prog && link->prog->sleepable)) call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); + /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */ + else if (bpf_link_is_tracepoint(link) && tp_srcu) + call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp); else call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); } else if (ops->dealloc) { -- 2.52.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi @ 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi 2026-03-30 10:00 ` Puranjay Mohan 2026-03-30 9:52 ` Puranjay Mohan 2026-03-30 15:07 ` Steven Rostedt 2 siblings, 1 reply; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 3:36 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team On Mon, 30 Mar 2026 at 05:21, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > Recently, tracepoints were switched from using disabled preemption > (which acts as RCU read section) to SRCU-fast when they are not > faultable. This means that to do a proper grace period wait for programs > running in such tracepoints, we must use SRCU's grace period wait. > This is only for non-faultable tracepoints, faultable ones continue > using RCU Tasks Trace. > > However, bpf_link_free() currently does call_rcu() for all cases when > the link is non-sleepable (hence, for tracepoints, non-faultable). Fix > this by doing a call_srcu() grace period wait. > > As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed > unnecessary for tracepoint programs. The link and program are either > accessed under RCU Tasks Trace protection, or SRCU-fast protection now. > > The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was > to generalize the logic, even if it conceded an extra RCU gp wait, > however that is unnecessary for tracepoints even before this change. > In practice no cost was paid since rcu_trace_implies_rcu_gp() was always > true. > > Hence we need not chain any SRCU gp waits after RCU Tasks Trace. ... or chaining RCP gp after SRCU gp, rather, the commit log should probably say that instead. The above might be confusing. But more eyes on this would be great, I went back and read a few discussions on why we were chaining RCU gp after RCU-tt gp and couldn't convince myself it was necessary for the tracepoint path. > > [...] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi @ 2026-03-30 10:00 ` Puranjay Mohan 2026-03-30 14:03 ` Kumar Kartikeya Dwivedi 0 siblings, 1 reply; 13+ messages in thread From: Puranjay Mohan @ 2026-03-30 10:00 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi, bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > On Mon, 30 Mar 2026 at 05:21, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: >> >> Recently, tracepoints were switched from using disabled preemption >> (which acts as RCU read section) to SRCU-fast when they are not >> faultable. This means that to do a proper grace period wait for programs >> running in such tracepoints, we must use SRCU's grace period wait. >> This is only for non-faultable tracepoints, faultable ones continue >> using RCU Tasks Trace. >> >> However, bpf_link_free() currently does call_rcu() for all cases when >> the link is non-sleepable (hence, for tracepoints, non-faultable). Fix >> this by doing a call_srcu() grace period wait. >> >> As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed >> unnecessary for tracepoint programs. The link and program are either >> accessed under RCU Tasks Trace protection, or SRCU-fast protection now. >> >> The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was >> to generalize the logic, even if it conceded an extra RCU gp wait, >> however that is unnecessary for tracepoints even before this change. >> In practice no cost was paid since rcu_trace_implies_rcu_gp() was always >> true. >> >> Hence we need not chain any SRCU gp waits after RCU Tasks Trace. > > ... or chaining RCP gp after SRCU gp, rather, the commit log should > probably say that instead. The above might be confusing. > But more eyes on this would be great, I went back and read a few > discussions on why we were chaining RCU gp after RCU-tt gp and > couldn't convince myself it was necessary for the tracepoint path. Yeah the commit message is a bit hard to follow, let me try to lay out why chaining isn't needed for either case, let me know if you agree with this analysis: For non-faultable tracepoints (the call_srcu path): The tracepoint dispatch macro in __DECLARE_TRACE does: guard(srcu_fast_notrace)(&tracepoint_srcu); __DO_TRACE_CALL(name, args); which calls into __bpf_trace_##call, which calls bpf_trace_runN, and that ends up in __bpf_trace_run() where we have: struct bpf_prog *prog = link->link.prog; ... rcu_read_lock_dont_migrate(); ... run_ctx.bpf_cookie = link->cookie; bpf_prog_run(prog, args); ... rcu_read_unlock_migrate(); Both the link dereference (link->link.prog) and the rcu_read_lock_dont_migrate() happen inside the SRCU-fast read section from the tracepoint macro. So classic RCU is nested inside SRCU-fast here. When the SRCU grace period completes, all in-flight SRCU-fast readers have finished, which means all their nested classic RCU read sections have also finished. No need to chain a classic RCU GP after the SRCU GP. For faultable tracepoints (the call_rcu_tasks_trace path): __DECLARE_TRACE_SYSCALL uses guard(rcu_tasks_trace)() instead of SRCU-fast, so SRCU isn't involved at all on this path. The link and program are accessed exclusively under RCU Tasks Trace protection. A tasks trace GP is sufficient on its own, and since tasks trace GP implies classic RCU GP, there's nothing to chain. So in both cases, the outermost protection (SRCU-fast or tasks trace) is what we wait for in bpf_link_free(), and the inner rcu_read_lock_dont_migrate() in __bpf_trace_run() is subsumed by that outer GP wait. Am I missing something? Thanks, Puranjay ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 10:00 ` Puranjay Mohan @ 2026-03-30 14:03 ` Kumar Kartikeya Dwivedi 0 siblings, 0 replies; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 14:03 UTC (permalink / raw) To: Puranjay Mohan Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team On Mon, 30 Mar 2026 at 12:00, Puranjay Mohan <puranjay@kernel.org> wrote: > > Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > > > On Mon, 30 Mar 2026 at 05:21, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > >> > >> Recently, tracepoints were switched from using disabled preemption > >> (which acts as RCU read section) to SRCU-fast when they are not > >> faultable. This means that to do a proper grace period wait for programs > >> running in such tracepoints, we must use SRCU's grace period wait. > >> This is only for non-faultable tracepoints, faultable ones continue > >> using RCU Tasks Trace. > >> > >> However, bpf_link_free() currently does call_rcu() for all cases when > >> the link is non-sleepable (hence, for tracepoints, non-faultable). Fix > >> this by doing a call_srcu() grace period wait. > >> > >> As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed > >> unnecessary for tracepoint programs. The link and program are either > >> accessed under RCU Tasks Trace protection, or SRCU-fast protection now. > >> > >> The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was > >> to generalize the logic, even if it conceded an extra RCU gp wait, > >> however that is unnecessary for tracepoints even before this change. > >> In practice no cost was paid since rcu_trace_implies_rcu_gp() was always > >> true. > >> > >> Hence we need not chain any SRCU gp waits after RCU Tasks Trace. > > > > ... or chaining RCP gp after SRCU gp, rather, the commit log should > > probably say that instead. The above might be confusing. > > But more eyes on this would be great, I went back and read a few > > discussions on why we were chaining RCU gp after RCU-tt gp and > > couldn't convince myself it was necessary for the tracepoint path. > > Yeah the commit message is a bit hard to follow, let me try to lay out > why chaining isn't needed for either case, let me know if you agree with > this analysis: > > For non-faultable tracepoints (the call_srcu path): > > The tracepoint dispatch macro in __DECLARE_TRACE does: > > guard(srcu_fast_notrace)(&tracepoint_srcu); > __DO_TRACE_CALL(name, args); > > which calls into __bpf_trace_##call, which calls bpf_trace_runN, and > that ends up in __bpf_trace_run() where we have: > > struct bpf_prog *prog = link->link.prog; > ... > rcu_read_lock_dont_migrate(); > ... > run_ctx.bpf_cookie = link->cookie; > bpf_prog_run(prog, args); > ... > rcu_read_unlock_migrate(); > > Both the link dereference (link->link.prog) and the > rcu_read_lock_dont_migrate() happen inside the SRCU-fast read section > from the tracepoint macro. So classic RCU is nested inside SRCU-fast > here. When the SRCU grace period completes, all in-flight SRCU-fast > readers have finished, which means all their nested classic RCU read > sections have also finished. No need to chain a classic RCU GP after > the SRCU GP. > > For faultable tracepoints (the call_rcu_tasks_trace path): > > __DECLARE_TRACE_SYSCALL uses guard(rcu_tasks_trace)() instead of > SRCU-fast, so SRCU isn't involved at all on this path. The link and > program are accessed exclusively under RCU Tasks Trace protection. > A tasks trace GP is sufficient on its own, and since tasks trace GP > implies classic RCU GP, there's nothing to chain. > > So in both cases, the outermost protection (SRCU-fast or tasks trace) > is what we wait for in bpf_link_free(), and the inner > rcu_read_lock_dont_migrate() in __bpf_trace_run() is subsumed by that > outer GP wait. > > Am I missing something? No, this is my thinking as well, but thanks for laying it out so clearly. I think it was unnecessary before too but I think the purpose to chain them was to keep things generic and not tracepoint specific. I don't see a path to that here (or unnecessary chaining penalization with RCU gp after SRCU gp, which isn't implied unlike with RCU-tt gp), so I leaned toward the current change. > > Thanks, > Puranjay ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi @ 2026-03-30 9:52 ` Puranjay Mohan 2026-03-30 14:02 ` Kumar Kartikeya Dwivedi 2026-03-30 15:07 ` Steven Rostedt 2 siblings, 1 reply; 13+ messages in thread From: Puranjay Mohan @ 2026-03-30 9:52 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi, bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > Recently, tracepoints were switched from using disabled preemption > (which acts as RCU read section) to SRCU-fast when they are not > faultable. This means that to do a proper grace period wait for programs > running in such tracepoints, we must use SRCU's grace period wait. > This is only for non-faultable tracepoints, faultable ones continue > using RCU Tasks Trace. > > However, bpf_link_free() currently does call_rcu() for all cases when > the link is non-sleepable (hence, for tracepoints, non-faultable). Fix > this by doing a call_srcu() grace period wait. > > As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed > unnecessary for tracepoint programs. The link and program are either > accessed under RCU Tasks Trace protection, or SRCU-fast protection now. > > The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was > to generalize the logic, even if it conceded an extra RCU gp wait, > however that is unnecessary for tracepoints even before this change. > In practice no cost was paid since rcu_trace_implies_rcu_gp() was always > true. > > Hence we need not chain any SRCU gp waits after RCU Tasks Trace. > > Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> With a nit below which is mostly informational. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> > --- > include/linux/tracepoint.h | 8 ++++++++ > kernel/bpf/syscall.c | 22 ++++++++++++++++++++-- > 2 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 22ca1c8b54f3..8227102a771f 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -113,6 +113,10 @@ void for_each_tracepoint_in_module(struct module *mod, > */ > #ifdef CONFIG_TRACEPOINTS > extern struct srcu_struct tracepoint_srcu; > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > +{ > + return &tracepoint_srcu; > +} > static inline void tracepoint_synchronize_unregister(void) > { > synchronize_rcu_tasks_trace(); > @@ -123,6 +127,10 @@ static inline bool tracepoint_is_faultable(struct tracepoint *tp) > return tp->ext && tp->ext->faultable; > } > #else > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > +{ > + return NULL; > +} > static inline void tracepoint_synchronize_unregister(void) > { } > static inline bool tracepoint_is_faultable(struct tracepoint *tp) > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 274039e36465..ab61a5ce35af 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -3261,6 +3261,13 @@ static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) > bpf_link_dealloc(link); > } > > +static bool bpf_link_is_tracepoint(struct bpf_link *link) > +{ > + /* Only these combinations support a tracepoint bpf_link. */ > + return link->type == BPF_LINK_TYPE_RAW_TRACEPOINT || > + (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); nit: this second check is never true here, because BPF_LINK_TYPE_TRACING uses bpf_tracing_link_lops, which has .dealloc (not .dealloc_deferred) and this function is only called from dealloc_deferred() path. > +} > + > static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) > { > if (rcu_trace_implies_rcu_gp()) > @@ -3279,16 +3286,27 @@ static void bpf_link_free(struct bpf_link *link) > if (link->prog) > ops->release(link); > if (ops->dealloc_deferred) { > - /* Schedule BPF link deallocation, which will only then > + struct srcu_struct *tp_srcu = tracepoint_srcu_ptr(); > + > + /* > + * Schedule BPF link deallocation, which will only then > * trigger putting BPF program refcount. > * If underlying BPF program is sleepable or BPF link's target > * attach hookpoint is sleepable or otherwise requires RCU GPs > * to ensure link and its underlying BPF program is not > * reachable anymore, we need to first wait for RCU tasks > - * trace sync, and then go through "classic" RCU grace period > + * trace sync, and then go through "classic" RCU grace period. > + * > + * For tracepoint BPF links, we need to go through SRCU grace > + * period wait instead when non-faultable tracepoint is used. We > + * don't need to chain SRCU grace period waits, however, for the > + * faultable case, since it exclusively uses RCU Tasks Trace. > */ > if (link->sleepable || (link->prog && link->prog->sleepable)) > call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > + /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */ > + else if (bpf_link_is_tracepoint(link) && tp_srcu) > + call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp); > else > call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > } else if (ops->dealloc) { > -- > 2.52.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 9:52 ` Puranjay Mohan @ 2026-03-30 14:02 ` Kumar Kartikeya Dwivedi 0 siblings, 0 replies; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 14:02 UTC (permalink / raw) To: Puranjay Mohan Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team On Mon, 30 Mar 2026 at 11:52, Puranjay Mohan <puranjay@kernel.org> wrote: > > Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > > > Recently, tracepoints were switched from using disabled preemption > > (which acts as RCU read section) to SRCU-fast when they are not > > faultable. This means that to do a proper grace period wait for programs > > running in such tracepoints, we must use SRCU's grace period wait. > > This is only for non-faultable tracepoints, faultable ones continue > > using RCU Tasks Trace. > > > > However, bpf_link_free() currently does call_rcu() for all cases when > > the link is non-sleepable (hence, for tracepoints, non-faultable). Fix > > this by doing a call_srcu() grace period wait. > > > > As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed > > unnecessary for tracepoint programs. The link and program are either > > accessed under RCU Tasks Trace protection, or SRCU-fast protection now. > > > > The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was > > to generalize the logic, even if it conceded an extra RCU gp wait, > > however that is unnecessary for tracepoints even before this change. > > In practice no cost was paid since rcu_trace_implies_rcu_gp() was always > > true. > > > > Hence we need not chain any SRCU gp waits after RCU Tasks Trace. > > > > Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > > With a nit below which is mostly informational. > > Reviewed-by: Puranjay Mohan <puranjay@kernel.org> > > > --- > > include/linux/tracepoint.h | 8 ++++++++ > > kernel/bpf/syscall.c | 22 ++++++++++++++++++++-- > > 2 files changed, 28 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index 22ca1c8b54f3..8227102a771f 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -113,6 +113,10 @@ void for_each_tracepoint_in_module(struct module *mod, > > */ > > #ifdef CONFIG_TRACEPOINTS > > extern struct srcu_struct tracepoint_srcu; > > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > > +{ > > + return &tracepoint_srcu; > > +} > > static inline void tracepoint_synchronize_unregister(void) > > { > > synchronize_rcu_tasks_trace(); > > @@ -123,6 +127,10 @@ static inline bool tracepoint_is_faultable(struct tracepoint *tp) > > return tp->ext && tp->ext->faultable; > > } > > #else > > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > > +{ > > + return NULL; > > +} > > static inline void tracepoint_synchronize_unregister(void) > > { } > > static inline bool tracepoint_is_faultable(struct tracepoint *tp) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index 274039e36465..ab61a5ce35af 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -3261,6 +3261,13 @@ static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) > > bpf_link_dealloc(link); > > } > > > > +static bool bpf_link_is_tracepoint(struct bpf_link *link) > > +{ > > + /* Only these combinations support a tracepoint bpf_link. */ > > + return link->type == BPF_LINK_TYPE_RAW_TRACEPOINT || > > + (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); > > nit: this second check is never true here, because BPF_LINK_TYPE_TRACING > uses bpf_tracing_link_lops, which has .dealloc (not .dealloc_deferred) > and this function is only called from dealloc_deferred() path. bpf_raw_tp_link_attach has a special case for this, where this combination ends up using bpf_raw_tp_link_lops instead. I will add a comment since it's non-obvious. If you think about it, if this wasn't the case, we'd have another bug. > > [...] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi 2026-03-30 9:52 ` Puranjay Mohan @ 2026-03-30 15:07 ` Steven Rostedt 2026-03-30 15:27 ` Kumar Kartikeya Dwivedi 2 siblings, 1 reply; 13+ messages in thread From: Steven Rostedt @ 2026-03-30 15:07 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, kkd, kernel-team On Mon, 30 Mar 2026 05:21:22 +0200 Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -113,6 +113,10 @@ void for_each_tracepoint_in_module(struct module *mod, > */ > #ifdef CONFIG_TRACEPOINTS > extern struct srcu_struct tracepoint_srcu; > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > +{ > + return &tracepoint_srcu; > +} > static inline void tracepoint_synchronize_unregister(void) > { > synchronize_rcu_tasks_trace(); > @@ -123,6 +127,10 @@ static inline bool tracepoint_is_faultable(struct tracepoint *tp) > return tp->ext && tp->ext->faultable; > } > #else > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > +{ > + return NULL; > +} I rather not export the internal workings of tracepoints like this. We do have a tracepoint_synchronize_unregister(). Either you can add a workqueue that calls this and then frees the data. Or possibly we can create some kind of: void call_tracepoint_synchronize_unregister(struct rcu_head *head, rcu_callback_t func) helper function. But let's not make tracepoints being protected by SRCU an exposed API. -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 15:07 ` Steven Rostedt @ 2026-03-30 15:27 ` Kumar Kartikeya Dwivedi 2026-03-30 16:10 ` Steven Rostedt 0 siblings, 1 reply; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 15:27 UTC (permalink / raw) To: Steven Rostedt Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, kkd, kernel-team On Mon, 30 Mar 2026 at 17:06, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 30 Mar 2026 05:21:22 +0200 > Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -113,6 +113,10 @@ void for_each_tracepoint_in_module(struct module *mod, > > */ > > #ifdef CONFIG_TRACEPOINTS > > extern struct srcu_struct tracepoint_srcu; > > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > > +{ > > + return &tracepoint_srcu; > > +} > > static inline void tracepoint_synchronize_unregister(void) > > { > > synchronize_rcu_tasks_trace(); > > @@ -123,6 +127,10 @@ static inline bool tracepoint_is_faultable(struct tracepoint *tp) > > return tp->ext && tp->ext->faultable; > > } > > #else > > +static inline struct srcu_struct *tracepoint_srcu_ptr(void) > > +{ > > + return NULL; > > +} > > I rather not export the internal workings of tracepoints like this. We do > have a tracepoint_synchronize_unregister(). > > Either you can add a workqueue that calls this and then frees the > data. Or possibly we can create some kind of: > > void call_tracepoint_synchronize_unregister(struct rcu_head *head, rcu_callback_t func) > > helper function. > > But let's not make tracepoints being protected by SRCU an exposed API. This is reasonable, though a slight hiccup is that we don't want both RCU tasks trace and SRCU grace period waits here, which a generic unregister API would try to do like tracepoint_synchronize_unregister(). I don't think we can generically chain grace periods on our own at all. So can we make a non_faultable variant for it, for clarity? The existing name would be too long though; how about call_tracepoint_unregister_non_faultable()? > > -- Steve > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link 2026-03-30 15:27 ` Kumar Kartikeya Dwivedi @ 2026-03-30 16:10 ` Steven Rostedt 0 siblings, 0 replies; 13+ messages in thread From: Steven Rostedt @ 2026-03-30 16:10 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, kkd, kernel-team On Mon, 30 Mar 2026 17:27:18 +0200 Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > This is reasonable, though a slight hiccup is that we don't want both > RCU tasks trace and SRCU grace period waits here, which a generic > unregister API would try to do like tracepoint_synchronize_unregister(). > > I don't think we can generically chain grace periods on our own at > all. So can we make a non_faultable variant for it, for clarity? The > existing name would be too long though; how about > call_tracepoint_unregister_non_faultable()? That sounds fine to me. -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() 2026-03-30 3:21 [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints Kumar Kartikeya Dwivedi 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi @ 2026-03-30 3:21 ` Kumar Kartikeya Dwivedi 2026-03-30 10:17 ` Puranjay Mohan 2026-03-30 10:40 ` Paul E. McKenney 1 sibling, 2 replies; 13+ messages in thread From: Kumar Kartikeya Dwivedi @ 2026-03-30 3:21 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team RCU Tasks Trace grace period implies RCU grace period, and this guarantee is expected to remain in the future. Only BPF is the user of this predicate, hence retire the API and clean up all in-tree users. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> --- include/linux/rcupdate.h | 12 ------------ kernel/bpf/core.c | 10 ++++------ kernel/bpf/memalloc.c | 33 ++++++++++----------------------- kernel/bpf/syscall.c | 24 ++++++------------------ 4 files changed, 20 insertions(+), 59 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 04f3f86a4145..bfa765132de8 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -205,18 +205,6 @@ static inline void exit_tasks_rcu_start(void) { } static inline void exit_tasks_rcu_finish(void) { } #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ -/** - * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period? - * - * As an accident of implementation, an RCU Tasks Trace grace period also - * acts as an RCU grace period. However, this could change at any time. - * Code relying on this accident must call this function to verify that - * this accident is still happening. - * - * You have been warned! - */ -static inline bool rcu_trace_implies_rcu_gp(void) { return true; } - /** * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU * diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7b675a451ec8..1984f061dcf4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2641,14 +2641,12 @@ static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu) { struct bpf_prog_array *progs; - /* If RCU Tasks Trace grace period implies RCU grace period, there is - * no need to call kfree_rcu(), just call kfree() directly. + /* + * RCU Tasks Trace grace period implies RCU grace period, there is no + * need to call kfree_rcu(), just call kfree() directly. */ progs = container_of(rcu, struct bpf_prog_array, rcu); - if (rcu_trace_implies_rcu_gp()) - kfree(progs); - else - kfree_rcu(progs, rcu); + kfree(progs); } void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs) diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 682a9f34214b..e9662db7198f 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -284,17 +284,6 @@ static void __free_rcu(struct rcu_head *head) atomic_set(&c->call_rcu_ttrace_in_progress, 0); } -static void __free_rcu_tasks_trace(struct rcu_head *head) -{ - /* If RCU Tasks Trace grace period implies RCU grace period, - * there is no need to invoke call_rcu(). - */ - if (rcu_trace_implies_rcu_gp()) - __free_rcu(head); - else - call_rcu(head, __free_rcu); -} - static void enque_to_free(struct bpf_mem_cache *c, void *obj) { struct llist_node *llnode = obj; @@ -326,12 +315,12 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) return; } - /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. - * If RCU Tasks Trace grace period implies RCU grace period, free - * these elements directly, else use call_rcu() to wait for normal - * progs to finish and finally do free_one() on each element. + /* + * Use call_rcu_tasks_trace() to wait for sleepable progs to finish. + * RCU Tasks Trace grace period implies RCU grace period, so pass + * __free_rcu directly as the callback. */ - call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); + call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu); } static void free_bulk(struct bpf_mem_cache *c) @@ -696,20 +685,18 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) static void free_mem_alloc(struct bpf_mem_alloc *ma) { - /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks + /* + * waiting_for_gp[_ttrace] lists were drained, but RCU callbacks * might still execute. Wait for them. * * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used - * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(), - * so if call_rcu(head, __free_rcu) is skipped due to - * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by - * using rcu_trace_implies_rcu_gp() as well. + * to wait for the pending __free_by_rcu(), and __free_rcu(). RCU Tasks + * Trace grace period implies RCU grace period, so all __free_rcu don't + * need extra call_rcu() (and thus extra rcu_barrier() here). */ rcu_barrier(); /* wait for __free_by_rcu */ rcu_barrier_tasks_trace(); /* wait for __free_rcu */ - if (!rcu_trace_implies_rcu_gp()) - rcu_barrier(); free_mem_alloc_no_barrier(ma); } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ab61a5ce35af..dbba0a5fa96e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -941,14 +941,6 @@ static void bpf_map_free_rcu_gp(struct rcu_head *rcu) bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu)); } -static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu) -{ - if (rcu_trace_implies_rcu_gp()) - bpf_map_free_rcu_gp(rcu); - else - call_rcu(rcu, bpf_map_free_rcu_gp); -} - /* decrement map refcnt and schedule it for freeing via workqueue * (underlying map implementation ops->map_free() might sleep) */ @@ -959,8 +951,9 @@ void bpf_map_put(struct bpf_map *map) bpf_map_free_id(map); WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt)); + /* RCU tasks trace grace period implies RCU grace period. */ if (READ_ONCE(map->free_after_mult_rcu_gp)) - call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp); + call_rcu_tasks_trace(&map->rcu, bpf_map_free_rcu_gp); else if (READ_ONCE(map->free_after_rcu_gp)) call_rcu(&map->rcu, bpf_map_free_rcu_gp); else @@ -3268,14 +3261,6 @@ static bool bpf_link_is_tracepoint(struct bpf_link *link) (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); } -static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) -{ - if (rcu_trace_implies_rcu_gp()) - bpf_link_defer_dealloc_rcu_gp(rcu); - else - call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); -} - /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { @@ -3301,9 +3286,12 @@ static void bpf_link_free(struct bpf_link *link) * period wait instead when non-faultable tracepoint is used. We * don't need to chain SRCU grace period waits, however, for the * faultable case, since it exclusively uses RCU Tasks Trace. + * + * RCU Tasks Trace grace period implies RCU grace period, hence + * pass bpf_link_defer_dealloc_rcu_gp as callback directly. */ if (link->sleepable || (link->prog && link->prog->sleepable)) - call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_rcu_gp); /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */ else if (bpf_link_is_tracepoint(link) && tp_srcu) call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp); -- 2.52.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() 2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi @ 2026-03-30 10:17 ` Puranjay Mohan 2026-03-30 10:40 ` Paul E. McKenney 1 sibling, 0 replies; 13+ messages in thread From: Puranjay Mohan @ 2026-03-30 10:17 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi, bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Steven Rostedt, kkd, kernel-team Kumar Kartikeya Dwivedi <memxor@gmail.com> writes: > RCU Tasks Trace grace period implies RCU grace period, and this > guarantee is expected to remain in the future. Only BPF is the user of > this predicate, hence retire the API and clean up all in-tree users. RCU Tasks Trace is now implemented on SRCU-fast and its grace period mechanism always has atleast one call to synchronize_rcu() as it is required for SRCU-fast's correctness (it replaces the smp_mb() that SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP. > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> A nit below about a stale comment. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> > --- > include/linux/rcupdate.h | 12 ------------ > kernel/bpf/core.c | 10 ++++------ > kernel/bpf/memalloc.c | 33 ++++++++++----------------------- > kernel/bpf/syscall.c | 24 ++++++------------------ > 4 files changed, 20 insertions(+), 59 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 04f3f86a4145..bfa765132de8 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -205,18 +205,6 @@ static inline void exit_tasks_rcu_start(void) { } > static inline void exit_tasks_rcu_finish(void) { } > #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ > > -/** > - * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period? > - * > - * As an accident of implementation, an RCU Tasks Trace grace period also > - * acts as an RCU grace period. However, this could change at any time. > - * Code relying on this accident must call this function to verify that > - * this accident is still happening. > - * > - * You have been warned! > - */ > -static inline bool rcu_trace_implies_rcu_gp(void) { return true; } > - There is comment in helpers.c that needs to be dropped/updated: kernel/bpf/helpers.c:1275: /* rcu_trace_implies_rcu_gp() is true and will remain so */ > /** > * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU > * > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 7b675a451ec8..1984f061dcf4 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2641,14 +2641,12 @@ static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu) > { > struct bpf_prog_array *progs; > > - /* If RCU Tasks Trace grace period implies RCU grace period, there is > - * no need to call kfree_rcu(), just call kfree() directly. > + /* > + * RCU Tasks Trace grace period implies RCU grace period, there is no > + * need to call kfree_rcu(), just call kfree() directly. > */ > progs = container_of(rcu, struct bpf_prog_array, rcu); > - if (rcu_trace_implies_rcu_gp()) > - kfree(progs); > - else > - kfree_rcu(progs, rcu); > + kfree(progs); > } > > void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs) > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 682a9f34214b..e9662db7198f 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -284,17 +284,6 @@ static void __free_rcu(struct rcu_head *head) > atomic_set(&c->call_rcu_ttrace_in_progress, 0); > } > > -static void __free_rcu_tasks_trace(struct rcu_head *head) > -{ > - /* If RCU Tasks Trace grace period implies RCU grace period, > - * there is no need to invoke call_rcu(). > - */ > - if (rcu_trace_implies_rcu_gp()) > - __free_rcu(head); > - else > - call_rcu(head, __free_rcu); > -} > - > static void enque_to_free(struct bpf_mem_cache *c, void *obj) > { > struct llist_node *llnode = obj; > @@ -326,12 +315,12 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) > return; > } > > - /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > - * If RCU Tasks Trace grace period implies RCU grace period, free > - * these elements directly, else use call_rcu() to wait for normal > - * progs to finish and finally do free_one() on each element. > + /* > + * Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > + * RCU Tasks Trace grace period implies RCU grace period, so pass > + * __free_rcu directly as the callback. > */ > - call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); > + call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu); > } > > static void free_bulk(struct bpf_mem_cache *c) > @@ -696,20 +685,18 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) > > static void free_mem_alloc(struct bpf_mem_alloc *ma) > { > - /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks > + /* > + * waiting_for_gp[_ttrace] lists were drained, but RCU callbacks > * might still execute. Wait for them. > * > * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), > * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used > - * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(), > - * so if call_rcu(head, __free_rcu) is skipped due to > - * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by > - * using rcu_trace_implies_rcu_gp() as well. > + * to wait for the pending __free_by_rcu(), and __free_rcu(). RCU Tasks > + * Trace grace period implies RCU grace period, so all __free_rcu don't > + * need extra call_rcu() (and thus extra rcu_barrier() here). > */ > rcu_barrier(); /* wait for __free_by_rcu */ > rcu_barrier_tasks_trace(); /* wait for __free_rcu */ > - if (!rcu_trace_implies_rcu_gp()) > - rcu_barrier(); > free_mem_alloc_no_barrier(ma); > } > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index ab61a5ce35af..dbba0a5fa96e 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -941,14 +941,6 @@ static void bpf_map_free_rcu_gp(struct rcu_head *rcu) > bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu)); > } > > -static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu) > -{ > - if (rcu_trace_implies_rcu_gp()) > - bpf_map_free_rcu_gp(rcu); > - else > - call_rcu(rcu, bpf_map_free_rcu_gp); > -} > - > /* decrement map refcnt and schedule it for freeing via workqueue > * (underlying map implementation ops->map_free() might sleep) > */ > @@ -959,8 +951,9 @@ void bpf_map_put(struct bpf_map *map) > bpf_map_free_id(map); > > WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt)); > + /* RCU tasks trace grace period implies RCU grace period. */ > if (READ_ONCE(map->free_after_mult_rcu_gp)) > - call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp); > + call_rcu_tasks_trace(&map->rcu, bpf_map_free_rcu_gp); > else if (READ_ONCE(map->free_after_rcu_gp)) > call_rcu(&map->rcu, bpf_map_free_rcu_gp); > else > @@ -3268,14 +3261,6 @@ static bool bpf_link_is_tracepoint(struct bpf_link *link) > (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); > } > > -static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) > -{ > - if (rcu_trace_implies_rcu_gp()) > - bpf_link_defer_dealloc_rcu_gp(rcu); > - else > - call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); > -} > - > /* bpf_link_free is guaranteed to be called from process context */ > static void bpf_link_free(struct bpf_link *link) > { > @@ -3301,9 +3286,12 @@ static void bpf_link_free(struct bpf_link *link) > * period wait instead when non-faultable tracepoint is used. We > * don't need to chain SRCU grace period waits, however, for the > * faultable case, since it exclusively uses RCU Tasks Trace. > + * > + * RCU Tasks Trace grace period implies RCU grace period, hence > + * pass bpf_link_defer_dealloc_rcu_gp as callback directly. > */ > if (link->sleepable || (link->prog && link->prog->sleepable)) > - call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */ > else if (bpf_link_is_tracepoint(link) && tp_srcu) > call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp); > -- > 2.52.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() 2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi 2026-03-30 10:17 ` Puranjay Mohan @ 2026-03-30 10:40 ` Paul E. McKenney 1 sibling, 0 replies; 13+ messages in thread From: Paul E. McKenney @ 2026-03-30 10:40 UTC (permalink / raw) To: Kumar Kartikeya Dwivedi Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Steven Rostedt, kkd, kernel-team On Mon, Mar 30, 2026 at 05:21:23AM +0200, Kumar Kartikeya Dwivedi wrote: > RCU Tasks Trace grace period implies RCU grace period, and this > guarantee is expected to remain in the future. Only BPF is the user of > this predicate, hence retire the API and clean up all in-tree users. > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> For the RCU part, given that you guys are the only users of the late great rcu_trace_implies_rcu_gp() API: Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > --- > include/linux/rcupdate.h | 12 ------------ > kernel/bpf/core.c | 10 ++++------ > kernel/bpf/memalloc.c | 33 ++++++++++----------------------- > kernel/bpf/syscall.c | 24 ++++++------------------ > 4 files changed, 20 insertions(+), 59 deletions(-) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 04f3f86a4145..bfa765132de8 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -205,18 +205,6 @@ static inline void exit_tasks_rcu_start(void) { } > static inline void exit_tasks_rcu_finish(void) { } > #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */ > > -/** > - * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period? > - * > - * As an accident of implementation, an RCU Tasks Trace grace period also > - * acts as an RCU grace period. However, this could change at any time. > - * Code relying on this accident must call this function to verify that > - * this accident is still happening. > - * > - * You have been warned! > - */ > -static inline bool rcu_trace_implies_rcu_gp(void) { return true; } > - > /** > * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU > * > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index 7b675a451ec8..1984f061dcf4 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2641,14 +2641,12 @@ static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu) > { > struct bpf_prog_array *progs; > > - /* If RCU Tasks Trace grace period implies RCU grace period, there is > - * no need to call kfree_rcu(), just call kfree() directly. > + /* > + * RCU Tasks Trace grace period implies RCU grace period, there is no > + * need to call kfree_rcu(), just call kfree() directly. > */ > progs = container_of(rcu, struct bpf_prog_array, rcu); > - if (rcu_trace_implies_rcu_gp()) > - kfree(progs); > - else > - kfree_rcu(progs, rcu); > + kfree(progs); > } > > void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs) > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c > index 682a9f34214b..e9662db7198f 100644 > --- a/kernel/bpf/memalloc.c > +++ b/kernel/bpf/memalloc.c > @@ -284,17 +284,6 @@ static void __free_rcu(struct rcu_head *head) > atomic_set(&c->call_rcu_ttrace_in_progress, 0); > } > > -static void __free_rcu_tasks_trace(struct rcu_head *head) > -{ > - /* If RCU Tasks Trace grace period implies RCU grace period, > - * there is no need to invoke call_rcu(). > - */ > - if (rcu_trace_implies_rcu_gp()) > - __free_rcu(head); > - else > - call_rcu(head, __free_rcu); > -} > - > static void enque_to_free(struct bpf_mem_cache *c, void *obj) > { > struct llist_node *llnode = obj; > @@ -326,12 +315,12 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c) > return; > } > > - /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > - * If RCU Tasks Trace grace period implies RCU grace period, free > - * these elements directly, else use call_rcu() to wait for normal > - * progs to finish and finally do free_one() on each element. > + /* > + * Use call_rcu_tasks_trace() to wait for sleepable progs to finish. > + * RCU Tasks Trace grace period implies RCU grace period, so pass > + * __free_rcu directly as the callback. > */ > - call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace); > + call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu); > } > > static void free_bulk(struct bpf_mem_cache *c) > @@ -696,20 +685,18 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) > > static void free_mem_alloc(struct bpf_mem_alloc *ma) > { > - /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks > + /* > + * waiting_for_gp[_ttrace] lists were drained, but RCU callbacks > * might still execute. Wait for them. > * > * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(), > * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used > - * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(), > - * so if call_rcu(head, __free_rcu) is skipped due to > - * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by > - * using rcu_trace_implies_rcu_gp() as well. > + * to wait for the pending __free_by_rcu(), and __free_rcu(). RCU Tasks > + * Trace grace period implies RCU grace period, so all __free_rcu don't > + * need extra call_rcu() (and thus extra rcu_barrier() here). > */ > rcu_barrier(); /* wait for __free_by_rcu */ > rcu_barrier_tasks_trace(); /* wait for __free_rcu */ > - if (!rcu_trace_implies_rcu_gp()) > - rcu_barrier(); > free_mem_alloc_no_barrier(ma); > } > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index ab61a5ce35af..dbba0a5fa96e 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -941,14 +941,6 @@ static void bpf_map_free_rcu_gp(struct rcu_head *rcu) > bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu)); > } > > -static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu) > -{ > - if (rcu_trace_implies_rcu_gp()) > - bpf_map_free_rcu_gp(rcu); > - else > - call_rcu(rcu, bpf_map_free_rcu_gp); > -} > - > /* decrement map refcnt and schedule it for freeing via workqueue > * (underlying map implementation ops->map_free() might sleep) > */ > @@ -959,8 +951,9 @@ void bpf_map_put(struct bpf_map *map) > bpf_map_free_id(map); > > WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt)); > + /* RCU tasks trace grace period implies RCU grace period. */ > if (READ_ONCE(map->free_after_mult_rcu_gp)) > - call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp); > + call_rcu_tasks_trace(&map->rcu, bpf_map_free_rcu_gp); > else if (READ_ONCE(map->free_after_rcu_gp)) > call_rcu(&map->rcu, bpf_map_free_rcu_gp); > else > @@ -3268,14 +3261,6 @@ static bool bpf_link_is_tracepoint(struct bpf_link *link) > (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); > } > > -static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) > -{ > - if (rcu_trace_implies_rcu_gp()) > - bpf_link_defer_dealloc_rcu_gp(rcu); > - else > - call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); > -} > - > /* bpf_link_free is guaranteed to be called from process context */ > static void bpf_link_free(struct bpf_link *link) > { > @@ -3301,9 +3286,12 @@ static void bpf_link_free(struct bpf_link *link) > * period wait instead when non-faultable tracepoint is used. We > * don't need to chain SRCU grace period waits, however, for the > * faultable case, since it exclusively uses RCU Tasks Trace. > + * > + * RCU Tasks Trace grace period implies RCU grace period, hence > + * pass bpf_link_defer_dealloc_rcu_gp as callback directly. > */ > if (link->sleepable || (link->prog && link->prog->sleepable)) > - call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); > + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_rcu_gp); > /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */ > else if (bpf_link_is_tracepoint(link) && tp_srcu) > call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp); > -- > 2.52.0 > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-30 16:09 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-30 3:21 [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints Kumar Kartikeya Dwivedi 2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi 2026-03-30 3:36 ` Kumar Kartikeya Dwivedi 2026-03-30 10:00 ` Puranjay Mohan 2026-03-30 14:03 ` Kumar Kartikeya Dwivedi 2026-03-30 9:52 ` Puranjay Mohan 2026-03-30 14:02 ` Kumar Kartikeya Dwivedi 2026-03-30 15:07 ` Steven Rostedt 2026-03-30 15:27 ` Kumar Kartikeya Dwivedi 2026-03-30 16:10 ` Steven Rostedt 2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi 2026-03-30 10:17 ` Puranjay Mohan 2026-03-30 10:40 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox