From: Puranjay Mohan <puranjay@kernel.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
kkd@meta.com, kernel-team@meta.com
Subject: Re: [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp()
Date: Mon, 30 Mar 2026 11:17:29 +0100 [thread overview]
Message-ID: <m2341h1v0m.fsf@kernel.org> (raw)
In-Reply-To: <20260330032124.3141001-3-memxor@gmail.com>
Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
> RCU Tasks Trace grace period implies RCU grace period, and this
> guarantee is expected to remain in the future. Only BPF is the user of
> this predicate, hence retire the API and clean up all in-tree users.
RCU Tasks Trace is now implemented on SRCU-fast and its grace period
mechanism always has atleast one call to synchronize_rcu() as it is
required for SRCU-fast's correctness (it replaces the smp_mb() that
SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP.
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
A nit below about a stale comment.
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> include/linux/rcupdate.h | 12 ------------
> kernel/bpf/core.c | 10 ++++------
> kernel/bpf/memalloc.c | 33 ++++++++++-----------------------
> kernel/bpf/syscall.c | 24 ++++++------------------
> 4 files changed, 20 insertions(+), 59 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 04f3f86a4145..bfa765132de8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -205,18 +205,6 @@ static inline void exit_tasks_rcu_start(void) { }
> static inline void exit_tasks_rcu_finish(void) { }
> #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
>
> -/**
> - * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
> - *
> - * As an accident of implementation, an RCU Tasks Trace grace period also
> - * acts as an RCU grace period. However, this could change at any time.
> - * Code relying on this accident must call this function to verify that
> - * this accident is still happening.
> - *
> - * You have been warned!
> - */
> -static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
> -
There is comment in helpers.c that needs to be dropped/updated:
kernel/bpf/helpers.c:1275: /* rcu_trace_implies_rcu_gp() is true and will remain so */
> /**
> * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
> *
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 7b675a451ec8..1984f061dcf4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2641,14 +2641,12 @@ static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
> {
> struct bpf_prog_array *progs;
>
> - /* If RCU Tasks Trace grace period implies RCU grace period, there is
> - * no need to call kfree_rcu(), just call kfree() directly.
> + /*
> + * RCU Tasks Trace grace period implies RCU grace period, there is no
> + * need to call kfree_rcu(), just call kfree() directly.
> */
> progs = container_of(rcu, struct bpf_prog_array, rcu);
> - if (rcu_trace_implies_rcu_gp())
> - kfree(progs);
> - else
> - kfree_rcu(progs, rcu);
> + kfree(progs);
> }
>
> void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 682a9f34214b..e9662db7198f 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -284,17 +284,6 @@ static void __free_rcu(struct rcu_head *head)
> atomic_set(&c->call_rcu_ttrace_in_progress, 0);
> }
>
> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> -{
> - /* If RCU Tasks Trace grace period implies RCU grace period,
> - * there is no need to invoke call_rcu().
> - */
> - if (rcu_trace_implies_rcu_gp())
> - __free_rcu(head);
> - else
> - call_rcu(head, __free_rcu);
> -}
> -
> static void enque_to_free(struct bpf_mem_cache *c, void *obj)
> {
> struct llist_node *llnode = obj;
> @@ -326,12 +315,12 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
> return;
> }
>
> - /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> - * If RCU Tasks Trace grace period implies RCU grace period, free
> - * these elements directly, else use call_rcu() to wait for normal
> - * progs to finish and finally do free_one() on each element.
> + /*
> + * Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> + * RCU Tasks Trace grace period implies RCU grace period, so pass
> + * __free_rcu directly as the callback.
> */
> - call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace);
> + call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu);
> }
>
> static void free_bulk(struct bpf_mem_cache *c)
> @@ -696,20 +685,18 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
>
> static void free_mem_alloc(struct bpf_mem_alloc *ma)
> {
> - /* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks
> + /*
> + * waiting_for_gp[_ttrace] lists were drained, but RCU callbacks
> * might still execute. Wait for them.
> *
> * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(),
> * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used
> - * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(),
> - * so if call_rcu(head, __free_rcu) is skipped due to
> - * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by
> - * using rcu_trace_implies_rcu_gp() as well.
> + * to wait for the pending __free_by_rcu(), and __free_rcu(). RCU Tasks
> + * Trace grace period implies RCU grace period, so all __free_rcu don't
> + * need extra call_rcu() (and thus extra rcu_barrier() here).
> */
> rcu_barrier(); /* wait for __free_by_rcu */
> rcu_barrier_tasks_trace(); /* wait for __free_rcu */
> - if (!rcu_trace_implies_rcu_gp())
> - rcu_barrier();
> free_mem_alloc_no_barrier(ma);
> }
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ab61a5ce35af..dbba0a5fa96e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -941,14 +941,6 @@ static void bpf_map_free_rcu_gp(struct rcu_head *rcu)
> bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu));
> }
>
> -static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu)
> -{
> - if (rcu_trace_implies_rcu_gp())
> - bpf_map_free_rcu_gp(rcu);
> - else
> - call_rcu(rcu, bpf_map_free_rcu_gp);
> -}
> -
> /* decrement map refcnt and schedule it for freeing via workqueue
> * (underlying map implementation ops->map_free() might sleep)
> */
> @@ -959,8 +951,9 @@ void bpf_map_put(struct bpf_map *map)
> bpf_map_free_id(map);
>
> WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt));
> + /* RCU tasks trace grace period implies RCU grace period. */
> if (READ_ONCE(map->free_after_mult_rcu_gp))
> - call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp);
> + call_rcu_tasks_trace(&map->rcu, bpf_map_free_rcu_gp);
> else if (READ_ONCE(map->free_after_rcu_gp))
> call_rcu(&map->rcu, bpf_map_free_rcu_gp);
> else
> @@ -3268,14 +3261,6 @@ static bool bpf_link_is_tracepoint(struct bpf_link *link)
> (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP);
> }
>
> -static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
> -{
> - if (rcu_trace_implies_rcu_gp())
> - bpf_link_defer_dealloc_rcu_gp(rcu);
> - else
> - call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
> -}
> -
> /* bpf_link_free is guaranteed to be called from process context */
> static void bpf_link_free(struct bpf_link *link)
> {
> @@ -3301,9 +3286,12 @@ static void bpf_link_free(struct bpf_link *link)
> * period wait instead when non-faultable tracepoint is used. We
> * don't need to chain SRCU grace period waits, however, for the
> * faultable case, since it exclusively uses RCU Tasks Trace.
> + *
> + * RCU Tasks Trace grace period implies RCU grace period, hence
> + * pass bpf_link_defer_dealloc_rcu_gp as callback directly.
> */
> if (link->sleepable || (link->prog && link->prog->sleepable))
> - call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
> + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
> /* We need to do a SRCU grace period wait for tracepoint-based BPF links. */
> else if (bpf_link_is_tracepoint(link) && tp_srcu)
> call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp);
> --
> 2.52.0
next prev parent reply other threads:[~2026-03-30 10:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 3:21 [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints Kumar Kartikeya Dwivedi
2026-03-30 3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi
2026-03-30 3:36 ` Kumar Kartikeya Dwivedi
2026-03-30 10:00 ` Puranjay Mohan
2026-03-30 14:03 ` Kumar Kartikeya Dwivedi
2026-03-30 9:52 ` Puranjay Mohan
2026-03-30 14:02 ` Kumar Kartikeya Dwivedi
2026-03-30 15:07 ` Steven Rostedt
2026-03-30 15:27 ` Kumar Kartikeya Dwivedi
2026-03-30 16:10 ` Steven Rostedt
2026-03-30 3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi
2026-03-30 10:17 ` Puranjay Mohan [this message]
2026-03-30 10:40 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2341h1v0m.fsf@kernel.org \
--to=puranjay@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kernel-team@meta.com \
--cc=kkd@meta.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox