public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Puranjay Mohan <puranjay@kernel.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	kkd@meta.com, kernel-team@meta.com
Subject: Re: [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp()
Date: Mon, 30 Mar 2026 11:17:29 +0100	[thread overview]
Message-ID: <m2341h1v0m.fsf@kernel.org> (raw)
In-Reply-To: <20260330032124.3141001-3-memxor@gmail.com>

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> RCU Tasks Trace grace period implies RCU grace period, and this
> guarantee is expected to remain in the future. Only BPF is the user of
> this predicate, hence retire the API and clean up all in-tree users.

RCU Tasks Trace is now implemented on SRCU-fast and its grace period
mechanism always has atleast one call to synchronize_rcu() as it is
required for SRCU-fast's correctness (it replaces the smp_mb() that
SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP.

> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>

A nit below about a stale comment.

Reviewed-by: Puranjay Mohan <puranjay@kernel.org>

> ---
>  include/linux/rcupdate.h | 12 ------------
>  kernel/bpf/core.c        | 10 ++++------
>  kernel/bpf/memalloc.c    | 33 ++++++++++-----------------------
>  kernel/bpf/syscall.c     | 24 ++++++------------------
>  4 files changed, 20 insertions(+), 59 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 04f3f86a4145..bfa765132de8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -205,18 +205,6 @@ static inline void exit_tasks_rcu_start(void) { }
>  static inline void exit_tasks_rcu_finish(void) { }
>  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
>  
> -/**
> - * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
> - *
> - * As an accident of implementation, an RCU Tasks Trace grace period also
> - * acts as an RCU grace period.  However, this could change at any time.
> - * Code relying on this accident must call this function to verify that
> - * this accident is still happening.
> - *
> - * You have been warned!
> - */
> -static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
> -

There is comment in helpers.c that needs to be dropped/updated:

kernel/bpf/helpers.c:1275:      /* rcu_trace_implies_rcu_gp() is true and will remain so */


>  /**
>   * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
>   *
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 7b675a451ec8..1984f061dcf4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2641,14 +2641,12 @@ static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
>  {
>  	struct bpf_prog_array *progs;
>  
> -	/* If RCU Tasks Trace grace period implies RCU grace period, there is
> -	 * no need to call kfree_rcu(), just call kfree() directly.
> +	/*
> +	 * RCU Tasks Trace grace period implies RCU grace period, there is no
> +	 * need to call kfree_rcu(), just call kfree() directly.
>  	 */
>  	progs = container_of(rcu, struct bpf_prog_array, rcu);
> -	if (rcu_trace_implies_rcu_gp())
> -		kfree(progs);
> -	else
> -		kfree_rcu(progs, rcu);
> +	kfree(progs);
>  }
>  
>  void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 682a9f34214b..e9662db7198f 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -284,17 +284,6 @@ static void __free_rcu(struct rcu_head *head)
>  	atomic_set(&c->call_rcu_ttrace_in_progress, 0);
>  }
>  
> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> -{
> -	/* If RCU Tasks Trace grace period implies RCU grace period,
> -	 * there is no need to invoke call_rcu().
> -	 */
> -	if (rcu_trace_implies_rcu_gp())
> -		__free_rcu(head);
> -	else
> -		call_rcu(head, __free_rcu);
> -}
> -
>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>  {
>  	struct llist_node *llnode = obj;
> @@ -326,12 +315,12 @@ static void do_call_rcu_ttrace(struct bpf_mem_cache *c)
>  		return;
>  	}
>  
> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> -	 * If RCU Tasks Trace grace period implies RCU grace period, free
> -	 * these elements directly, else use call_rcu() to wait for normal
> -	 * progs to finish and finally do free_one() on each element.
> +	/*
> +	 * Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> +	 * RCU Tasks Trace grace period implies RCU grace period, so pass
> +	 * __free_rcu directly as the callback.
>  	 */
> -	call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu_tasks_trace);
> +	call_rcu_tasks_trace(&c->rcu_ttrace, __free_rcu);
>  }
>  
>  static void free_bulk(struct bpf_mem_cache *c)
> @@ -696,20 +685,18 @@ static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma)
>  
>  static void free_mem_alloc(struct bpf_mem_alloc *ma)
>  {
> -	/* waiting_for_gp[_ttrace] lists were drained, but RCU callbacks
> +	/*
> +	 * waiting_for_gp[_ttrace] lists were drained, but RCU callbacks
>  	 * might still execute. Wait for them.
>  	 *
>  	 * rcu_barrier_tasks_trace() doesn't imply synchronize_rcu_tasks_trace(),
>  	 * but rcu_barrier_tasks_trace() and rcu_barrier() below are only used
> -	 * to wait for the pending __free_rcu_tasks_trace() and __free_rcu(),
> -	 * so if call_rcu(head, __free_rcu) is skipped due to
> -	 * rcu_trace_implies_rcu_gp(), it will be OK to skip rcu_barrier() by
> -	 * using rcu_trace_implies_rcu_gp() as well.
> +	 * to wait for the pending __free_by_rcu(), and __free_rcu(). RCU Tasks
> +	 * Trace grace period implies RCU grace period, so all __free_rcu don't
> +	 * need extra call_rcu() (and thus extra rcu_barrier() here).
>  	 */
>  	rcu_barrier(); /* wait for __free_by_rcu */
>  	rcu_barrier_tasks_trace(); /* wait for __free_rcu */
> -	if (!rcu_trace_implies_rcu_gp())
> -		rcu_barrier();
>  	free_mem_alloc_no_barrier(ma);
>  }
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ab61a5ce35af..dbba0a5fa96e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -941,14 +941,6 @@ static void bpf_map_free_rcu_gp(struct rcu_head *rcu)
>  	bpf_map_free_in_work(container_of(rcu, struct bpf_map, rcu));
>  }
>  
> -static void bpf_map_free_mult_rcu_gp(struct rcu_head *rcu)
> -{
> -	if (rcu_trace_implies_rcu_gp())
> -		bpf_map_free_rcu_gp(rcu);
> -	else
> -		call_rcu(rcu, bpf_map_free_rcu_gp);
> -}
> -
>  /* decrement map refcnt and schedule it for freeing via workqueue
>   * (underlying map implementation ops->map_free() might sleep)
>   */
> @@ -959,8 +951,9 @@ void bpf_map_put(struct bpf_map *map)
>  		bpf_map_free_id(map);
>  
>  		WARN_ON_ONCE(atomic64_read(&map->sleepable_refcnt));
> +		/* RCU tasks trace grace period implies RCU grace period. */
>  		if (READ_ONCE(map->free_after_mult_rcu_gp))
> -			call_rcu_tasks_trace(&map->rcu, bpf_map_free_mult_rcu_gp);
> +			call_rcu_tasks_trace(&map->rcu, bpf_map_free_rcu_gp);
>  		else if (READ_ONCE(map->free_after_rcu_gp))
>  			call_rcu(&map->rcu, bpf_map_free_rcu_gp);
>  		else
> @@ -3268,14 +3261,6 @@ static bool bpf_link_is_tracepoint(struct bpf_link *link)
>  	       (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP);
>  }
>  
> -static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
> -{
> -	if (rcu_trace_implies_rcu_gp())
> -		bpf_link_defer_dealloc_rcu_gp(rcu);
> -	else
> -		call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
> -}
> -
>  /* bpf_link_free is guaranteed to be called from process context */
>  static void bpf_link_free(struct bpf_link *link)
>  {
> @@ -3301,9 +3286,12 @@ static void bpf_link_free(struct bpf_link *link)
>  		 * period wait instead when non-faultable tracepoint is used. We
>  		 * don't need to chain SRCU grace period waits, however, for the
>  		 * faultable case, since it exclusively uses RCU Tasks Trace.
> +		 *
> +		 * RCU Tasks Trace grace period implies RCU grace period, hence
> +		 * pass bpf_link_defer_dealloc_rcu_gp as callback directly.
>  		 */
>  		if (link->sleepable || (link->prog && link->prog->sleepable))
> -			call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
> +			call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
>  		/* We need to do a SRCU grace period wait for tracepoint-based BPF links. */
>  		else if (bpf_link_is_tracepoint(link) && tp_srcu)
>  			call_srcu(tp_srcu, &link->rcu, bpf_link_defer_dealloc_rcu_gp);
> -- 
> 2.52.0

  reply	other threads:[~2026-03-30 10:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30  3:21 [PATCH bpf v1 0/2] Fix bpf_link grace period wait for tracepoints Kumar Kartikeya Dwivedi
2026-03-30  3:21 ` [PATCH bpf v1 1/2] bpf: Fix grace period wait for tracepoint bpf_link Kumar Kartikeya Dwivedi
2026-03-30  3:36   ` Kumar Kartikeya Dwivedi
2026-03-30 10:00     ` Puranjay Mohan
2026-03-30 14:03       ` Kumar Kartikeya Dwivedi
2026-03-30  9:52   ` Puranjay Mohan
2026-03-30 14:02     ` Kumar Kartikeya Dwivedi
2026-03-30 15:07   ` Steven Rostedt
2026-03-30 15:27     ` Kumar Kartikeya Dwivedi
2026-03-30 16:10       ` Steven Rostedt
2026-03-30  3:21 ` [PATCH bpf v1 2/2] bpf: Retire rcu_trace_implies_rcu_gp() Kumar Kartikeya Dwivedi
2026-03-30 10:17   ` Puranjay Mohan [this message]
2026-03-30 10:40   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2341h1v0m.fsf@kernel.org \
    --to=puranjay@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=kkd@meta.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox