public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
	rostedt@goodmis.org, "Paul E. McKenney" <paulmck@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	bpf@vger.kernel.org
Subject: [PATCH 22/34] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
Date: Tue, 23 Sep 2025 07:20:24 -0700	[thread overview]
Message-ID: <20250923142036.112290-22-paulmck@kernel.org> (raw)
In-Reply-To: <580ea2de-799a-4ddc-bde9-c16f3fb1e6e7@paulmck-laptop>

The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
to protect invocation of __DO_TRACE_CALL() means that BPF programs
attached to tracepoints are non-preemptible.  This is unhelpful in
real-time systems, whose users apparently wish to use BPF while also
achieving low latencies.  (Who knew?)

One option would be to use preemptible RCU, but this introduces
many opportunities for infinite recursion, which many consider to
be counterproductive, especially given the relatively small stacks
provided by the Linux kernel.  These opportunities could be shut down
by sufficiently energetic duplication of code, but this sort of thing
is considered impolite in some circles.

Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
readers than those of preemptible RCU, at least on my laptop, where
task_struct access is more expensive than access to per-CPU variables.
And SRCU fast provides way faster readers than does SRCU, courtesy of
being able to avoid the read-side use of smp_mb().  Also, it is quite
straightforward to create srcu_read_{,un}lock_fast_notrace() functions.

While in the area, SRCU now supports early boot call_srcu().  Therefore,
remove the checks that used to avoid such use from rcu_free_old_probes()
before this commit was applied:

e53244e2c893 ("tracepoint: Remove SRCU protection")

The current commit can be thought of as an approximate revert of that
commit, with some compensating additions of preemption disabling pointed
out by Steven Rostedt (thank you, Steven!).  This preemption disabling
uses guard(preempt_notrace)(), and while in the area a couple of other
use cases were also converted to guards.

However, Yonghong Song points out that BPF expects non-sleepable BPF
programs to remain on the same CPU, which means that migration must
be disabled whenever preemption remains enabled.  In addition, non-RT
kernels have performance expectations on BPF that would be violated
by allowing the BPF programs to be preempted.

Therefore, continue to disable preemption in non-RT kernels, and protect
the BPF program with both SRCU and migration disabling for RT kernels,
and even then only if preemption is not already disabled.

[ paulmck: Apply kernel test robot and Yonghong Song feedback. ]

Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
 include/linux/tracepoint.h   | 45 ++++++++++++++++++++++--------------
 include/trace/perf.h         |  4 ++--
 include/trace/trace_events.h |  4 ++--
 kernel/tracepoint.c          | 21 ++++++++++++++++-
 4 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 826ce3f8e1f851..9f8b19cd303acc 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -33,6 +33,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -115,7 +117,10 @@ void for_each_tracepoint_in_module(struct module *mod,
 static inline void tracepoint_synchronize_unregister(void)
 {
 	synchronize_rcu_tasks_trace();
-	synchronize_rcu();
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		synchronize_srcu(&tracepoint_srcu);
+	else
+		synchronize_rcu();
 }
 static inline bool tracepoint_is_faultable(struct tracepoint *tp)
 {
@@ -266,23 +271,29 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 		return static_branch_unlikely(&__tracepoint_##name.key);\
 	}
 
-#define __DECLARE_TRACE(name, proto, args, cond, data_proto)		\
+#define __DECLARE_TRACE(name, proto, args, cond, data_proto)			\
 	__DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
-	static inline void __do_trace_##name(proto)			\
-	{								\
-		if (cond) {						\
-			guard(preempt_notrace)();			\
-			__DO_TRACE_CALL(name, TP_ARGS(args));		\
-		}							\
-	}								\
-	static inline void trace_##name(proto)				\
-	{								\
-		if (static_branch_unlikely(&__tracepoint_##name.key))	\
-			__do_trace_##name(args);			\
-		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
-			WARN_ONCE(!rcu_is_watching(),			\
-				  "RCU not watching for tracepoint");	\
-		}							\
+	static inline void __do_trace_##name(proto)				\
+	{									\
+		if (cond) {							\
+			if (IS_ENABLED(CONFIG_PREEMPT_RT) && preemptible()) {	\
+				guard(srcu_fast_notrace)(&tracepoint_srcu);	\
+				guard(migrate)();				\
+				__DO_TRACE_CALL(name, TP_ARGS(args));		\
+			} else {						\
+				guard(preempt_notrace)();			\
+				__DO_TRACE_CALL(name, TP_ARGS(args));		\
+			}							\
+		}								\
+	}									\
+	static inline void trace_##name(proto)					\
+	{									\
+		if (static_branch_unlikely(&__tracepoint_##name.key))		\
+			__do_trace_##name(args);				\
+		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {			\
+			WARN_ONCE(!rcu_is_watching(),				\
+				  "RCU not watching for tracepoint");		\
+		}								\
 	}
 
 #define __DECLARE_TRACE_SYSCALL(name, proto, args, data_proto)		\
diff --git a/include/trace/perf.h b/include/trace/perf.h
index a1754b73a8f55b..348ad1d9b5566e 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -71,6 +71,7 @@ perf_trace_##call(void *__data, proto)					\
 	u64 __count __attribute__((unused));				\
 	struct task_struct *__task __attribute__((unused));		\
 									\
+	guard(preempt_notrace)();					\
 	do_perf_trace_##call(__data, args);				\
 }
 
@@ -85,9 +86,8 @@ perf_trace_##call(void *__data, proto)					\
 	struct task_struct *__task __attribute__((unused));		\
 									\
 	might_fault();							\
-	preempt_disable_notrace();					\
+	guard(preempt_notrace)();					\
 	do_perf_trace_##call(__data, args);				\
-	preempt_enable_notrace();					\
 }
 
 /*
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4f22136fd4656c..fbc07d353be6b6 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -436,6 +436,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
 static notrace void							\
 trace_event_raw_event_##call(void *__data, proto)			\
 {									\
+	guard(preempt_notrace)();					\
 	do_trace_event_raw_event_##call(__data, args);			\
 }
 
@@ -447,9 +448,8 @@ static notrace void							\
 trace_event_raw_event_##call(void *__data, proto)			\
 {									\
 	might_fault();							\
-	preempt_disable_notrace();					\
+	guard(preempt_notrace)();					\
 	do_trace_event_raw_event_##call(__data, args);			\
-	preempt_enable_notrace();					\
 }
 
 /*
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 62719d2941c900..21bb6779821476 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -25,6 +25,9 @@ enum tp_func_state {
 extern tracepoint_ptr_t __start___tracepoints_ptrs[];
 extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU_FAST(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 enum tp_transition_sync {
 	TP_TRANSITION_SYNC_1_0_1,
 	TP_TRANSITION_SYNC_N_2_1,
@@ -34,6 +37,7 @@ enum tp_transition_sync {
 
 struct tp_transition_snapshot {
 	unsigned long rcu;
+	unsigned long srcu_gp;
 	bool ongoing;
 };
 
@@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync)
 
 	/* Keep the latest get_state snapshot. */
 	snapshot->rcu = get_state_synchronize_rcu();
+	snapshot->srcu_gp = start_poll_synchronize_srcu(&tracepoint_srcu);
 	snapshot->ongoing = true;
 }
 
@@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync)
 	if (!snapshot->ongoing)
 		return;
 	cond_synchronize_rcu(snapshot->rcu);
+	if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp))
+		synchronize_srcu(&tracepoint_srcu);
 	snapshot->ongoing = false;
 }
 
@@ -101,17 +108,29 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
 
+		/*
+		 * Tracepoint probes are protected by either RCU or
+		 * Tasks Trace RCU and also by SRCU.  By calling the SRCU
+		 * callback in the [Tasks Trace] RCU callback we cover
+		 * both cases. So let us chain the SRCU and [Tasks Trace]
+		 * RCU callbacks to wait for both grace periods.
+		 */
 		if (tracepoint_is_faultable(tp))
 			call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
 		else
-- 
2.40.1


  parent reply	other threads:[~2025-09-23 14:21 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 14:20 [PATCH 0/34] Implement RCU Tasks Trace in terms of SRCU-fast and optimize Paul E. McKenney
2025-09-23 14:20 ` [PATCH 01/34] rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast Paul E. McKenney
2025-09-25 18:39   ` Andrii Nakryiko
2025-09-25 18:58     ` Paul E. McKenney
2025-09-23 14:20 ` [PATCH 02/34] rcu: Remove unused ->trc_ipi_to_cpu and ->trc_blkd_cpu from task_struct Paul E. McKenney
2025-09-23 14:20 ` [PATCH 03/34] rcu: Remove ->trc_blkd_node " Paul E. McKenney
2025-09-23 14:20 ` [PATCH 04/34] rcu: Remove ->trc_holdout_list " Paul E. McKenney
2025-09-23 14:20 ` [PATCH 05/34] rcu: Remove rcu_tasks_trace_qs() and the functions that it calls Paul E. McKenney
2025-09-23 14:20 ` [PATCH 06/34] context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() Paul E. McKenney
2025-09-23 14:20 ` [PATCH 07/34] rcu: Remove ->trc_reader_special from task_struct Paul E. McKenney
2025-09-23 14:20 ` [PATCH 08/34] rcu: Remove now-empty RCU Tasks Trace functions and calls to them Paul E. McKenney
2025-09-23 14:20 ` [PATCH 09/34] rcu: Remove unused rcu_tasks_trace_lazy_ms and trc_stall_chk_rdr struct Paul E. McKenney
2025-09-23 14:20 ` [PATCH 10/34] rcu: Remove now-empty show_rcu_tasks_trace_gp_kthread() function Paul E. McKenney
2025-09-23 14:20 ` [PATCH 11/34] rcu: Remove now-empty rcu_tasks_trace_get_gp_data() function Paul E. McKenney
2025-09-23 14:20 ` [PATCH 12/34] rcu: Remove now-empty rcu_tasks_trace_torture_stats_print() function Paul E. McKenney
2025-09-23 14:20 ` [PATCH 13/34] rcu: Remove now-empty get_rcu_tasks_trace_gp_kthread() function Paul E. McKenney
2025-09-23 14:20 ` [PATCH 14/34] rcu: Move rcu_tasks_trace_srcu_struct out of #ifdef CONFIG_TASKS_RCU_GENERIC Paul E. McKenney
2025-09-23 14:20 ` [PATCH 15/34] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs Paul E. McKenney
2025-09-23 17:32   ` Peter Zijlstra
2025-09-24  8:44     ` Paul E. McKenney
2025-09-24  9:08       ` Peter Zijlstra
2025-09-23 14:20 ` [PATCH 16/34] rcu: Remove now-unused rcu_task_ipi_delay and TASKS_TRACE_RCU_READ_MB Paul E. McKenney
2025-09-23 14:20 ` [PATCH 17/34] srcu: Create a DEFINE_SRCU_FAST() Paul E. McKenney
2025-09-23 14:20 ` [PATCH 18/34] rcu: Use smp_mb() only when necessary in RCU Tasks Trace readers Paul E. McKenney
2025-09-23 14:20 ` [PATCH 19/34] rcu: Update Requirements.rst for RCU Tasks Trace Paul E. McKenney
2025-09-25 18:40   ` Andrii Nakryiko
2025-09-25 18:53     ` Paul E. McKenney
2025-09-23 14:20 ` [PATCH 20/34] checkpatch: Deprecate rcu_read_{,un}lock_trace() Paul E. McKenney
2025-09-23 15:47   ` Joe Perches
2025-09-23 17:01     ` Paul E. McKenney
2025-09-23 14:20 ` [PATCH 21/34] rcu: Mark diagnostic functions as notrace Paul E. McKenney
2025-09-23 14:20 ` Paul E. McKenney [this message]
2025-09-23 14:20 ` [PATCH 23/34] srcu: Create an srcu_expedite_current() function Paul E. McKenney
2025-09-24  0:10   ` Zqiang
2025-09-24  8:56     ` Paul E. McKenney
2025-09-23 14:20 ` [PATCH 24/34] rcutorture: Test srcu_expedite_current() Paul E. McKenney
2025-09-23 14:20 ` [PATCH 25/34] srcu: Create an rcu_tasks_trace_expedite_current() function Paul E. McKenney
2025-09-23 14:20 ` [PATCH 26/34] rcutorture: Test rcu_tasks_trace_expedite_current() Paul E. McKenney
2025-09-23 14:20 ` [PATCH 27/34] srcu: Make DEFINE_SRCU_FAST() available to modules Paul E. McKenney
2025-09-23 14:20 ` [PATCH 28/34] srcu: Make SRCU-fast available to heap srcu_struct structures Paul E. McKenney
2025-09-23 14:20 ` [PATCH 29/34] srcu: Make grace-period determination use ssp->srcu_reader_flavor Paul E. McKenney
2025-09-23 14:20 ` [PATCH 30/34] rcutorture: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast() Paul E. McKenney
2025-09-23 14:20 ` [PATCH 31/34] refscale: " Paul E. McKenney
2025-09-23 14:20 ` [PATCH 32/34] srcu: Require special srcu_struct define/init for SRCU-fast readers Paul E. McKenney
2025-09-23 14:20 ` [PATCH 33/34] srcu: Make SRCU-fast readers enforce use of SRCU-fast definition/init Paul E. McKenney
2025-09-23 14:20 ` [PATCH 34/34] doc: Update for SRCU-fast definitions and initialization Paul E. McKenney
2025-09-24  7:49 ` [PATCH 0/34] Implement RCU Tasks Trace in terms of SRCU-fast and optimize Alexei Starovoitov
2025-09-24  8:20   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250923142036.112290-22-paulmck@kernel.org \
    --to=paulmck@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox