* [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast
@ 2025-07-23 20:27 Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 1/6] srcu: Move rcu_is_watching() checks to srcu_read_{,un}lock_fast() Paul E. McKenney
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt
Hello!
This is version 4 of a patch series creating a new notrace variant of
SRCU-fast and introducing it to the __DECLARE_TRACE() in place of the
current preemption disabling. This change enable preemption of BPF
programs attached to tracepoints, as is required for runtime use of BPF
in real-time systems.
This triggers continues to trigger a kernel test robot report of a
"using smp_processor_id() in preemptible" splat. I looked for issues
with explicit preemption disabling, and, not finding any, will next turn
my attention to accesses to per-CPU variables. Any and all insights
are welcome.
1. Move rcu_is_watching() checks to srcu_read_{,un}lock_fast().
2. Add srcu_read_lock_fast_notrace() and
srcu_read_unlock_fast_notrace().
3. Add guards for notrace variants of SRCU-fast readers.
4. Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast.
5. Document __srcu_read_{,un}lock_fast() implicit RCU readers.
6. Document srcu_flip() memory-barrier D relation to SRCU-fast.
Changes since v3:
o Add "notrace" per Joel, Steven, and Matthew feedback.
o Upgrade explanatory comments and add new ones per Joel feedback.
https://lore.kernel.org/all/20250721162433.10454-1-paulmck@kernel.org/
Changes since v2:
o Posting standalone as opposed to a reply.
https://lore.kernel.org/all/3cecf6c9-b2ee-4f34-9d1b-ca4cfb8e56a7@paulmck-laptop/
Changes since RFC version:
o RFC patch 6/4 has been pulled into the shared RCU tree:
e88c632a8698 ("srcu: Add guards for SRCU-fast readers")
o RFC patch 5/4 (which removed the now-unnecessary special boot-time
avoidance of SRCU) has been folded into patch 4/4 shown above,
as suggested by Steven Rostedt.
https://lore.kernel.org/all/bb20a575-235b-499e-aa1d-70fe9e2c7617@paulmck-laptop/
Thanx, Paul
------------------------------------------------------------------------
b/include/linux/srcu.h | 4 +++
b/include/linux/srcutree.h | 2 -
b/include/linux/tracepoint.h | 6 +++--
b/kernel/rcu/srcutree.c | 10 +++++++++
b/kernel/tracepoint.c | 21 ++++++++++++++++++-
include/linux/srcu.h | 35 ++++++++++++++++++++++++++++++--
include/linux/srcutree.h | 47 +++++++++++++++++++++++++++----------------
7 files changed, 101 insertions(+), 24 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 1/6] srcu: Move rcu_is_watching() checks to srcu_read_{,un}lock_fast()
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
@ 2025-07-23 20:27 ` Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace() Paul E. McKenney
` (5 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Joel Fernandes, Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
The rcu_is_watching() warnings are currently in the SRCU-tree
implementations of __srcu_read_lock_fast() and __srcu_read_unlock_fast().
However, this makes it difficult to create _notrace variants of
srcu_read_lock_fast() and srcu_read_unlock_fast(). This commit therefore
moves these checks to srcu_read_lock_fast(), srcu_read_unlock_fast(),
srcu_down_read_fast(), and srcu_up_read_fast().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
include/linux/srcu.h | 4 ++++
include/linux/srcutree.h | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index f179700fecafb..478c73d067f7d 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -275,6 +275,7 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
{
struct srcu_ctr __percpu *retval;
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast().");
srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
retval = __srcu_read_lock_fast(ssp);
rcu_try_lock_acquire(&ssp->dep_map);
@@ -295,6 +296,7 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
static inline struct srcu_ctr __percpu *srcu_down_read_fast(struct srcu_struct *ssp) __acquires(ssp)
{
WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi());
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_down_read_fast().");
srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
return __srcu_read_lock_fast(ssp);
}
@@ -389,6 +391,7 @@ static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ct
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
srcu_lock_release(&ssp->dep_map);
__srcu_read_unlock_fast(ssp, scp);
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
}
/**
@@ -405,6 +408,7 @@ static inline void srcu_up_read_fast(struct srcu_struct *ssp, struct srcu_ctr __
WARN_ON_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && in_nmi());
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
__srcu_read_unlock_fast(ssp, scp);
+ RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_up_read_fast().");
}
/**
diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index bf44d8d1e69ea..043b5a67ef71e 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -244,7 +244,6 @@ static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct
{
struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
- RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_lock_fast().");
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
this_cpu_inc(scp->srcu_locks.counter); /* Y */
else
@@ -275,7 +274,6 @@ static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_
this_cpu_inc(scp->srcu_unlocks.counter); /* Z */
else
atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); /* Z */
- RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
}
void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor);
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace()
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 1/6] srcu: Move rcu_is_watching() checks to srcu_read_{,un}lock_fast() Paul E. McKenney
@ 2025-07-23 20:27 ` Paul E. McKenney
2025-07-23 20:50 ` Boqun Feng
2025-07-23 20:27 ` [PATCH v4 3/6] srcu: Add guards for notrace variants of SRCU-fast readers Paul E. McKenney
` (4 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Joel Fernandes, Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
This commit adds no-trace variants of the srcu_read_lock_fast() and
srcu_read_unlock_fast() functions for tracing use.
[ paulmck: Apply notrace feedback from Joel Fernandes, Steven Rostedt, and Mathieu Desnoyers. ]
Link: https://lore.kernel.org/all/20250721162433.10454-1-paulmck@kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
include/linux/srcu.h | 30 ++++++++++++++++++++++++++++--
include/linux/srcutree.h | 5 +++--
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 478c73d067f7d..ec3b8e27d6c5a 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -271,7 +271,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
* where RCU is watching, that is, from contexts where it would be legal
* to invoke rcu_read_lock(). Otherwise, lockdep will complain.
*/
-static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
+static inline struct srcu_ctr __percpu notrace *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
{
struct srcu_ctr __percpu *retval;
@@ -282,6 +282,20 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
return retval;
}
+/*
+ * Used by tracing, cannot be traced and cannot call lockdep.
+ * See srcu_read_lock_fast() for more information.
+ */
+static inline struct srcu_ctr __percpu *srcu_read_lock_fast_notrace(struct srcu_struct *ssp)
+ __acquires(ssp)
+{
+ struct srcu_ctr __percpu *retval;
+
+ srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
+ retval = __srcu_read_lock_fast(ssp);
+ return retval;
+}
+
/**
* srcu_down_read_fast - register a new reader for an SRCU-protected structure.
* @ssp: srcu_struct in which to register the new reader.
@@ -385,7 +399,8 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
*
* Exit a light-weight SRCU read-side critical section.
*/
-static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
+static inline void notrace
+srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
__releases(ssp)
{
srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
@@ -394,6 +409,17 @@ static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ct
RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
}
+/*
+ * Used by tracing, cannot be traced and cannot call lockdep.
+ * See srcu_read_unlock_fast() for more information.
+ */
+static inline void srcu_read_unlock_fast_notrace(struct srcu_struct *ssp,
+ struct srcu_ctr __percpu *scp) __releases(ssp)
+{
+ srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
+ __srcu_read_unlock_fast(ssp, scp);
+}
+
/**
* srcu_up_read_fast - unregister a old reader from an SRCU-protected structure.
* @ssp: srcu_struct in which to unregister the old reader.
diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 043b5a67ef71e..4d2fee4d38289 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -240,7 +240,7 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
* on architectures that support NMIs but do not supply NMI-safe
* implementations of this_cpu_inc().
*/
-static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp)
+static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct srcu_struct *ssp)
{
struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
@@ -267,7 +267,8 @@ static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct
* on architectures that support NMIs but do not supply NMI-safe
* implementations of this_cpu_inc().
*/
-static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
+static inline void notrace
+__srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
{
barrier(); /* Avoid leaking the critical section. */
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 3/6] srcu: Add guards for notrace variants of SRCU-fast readers
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 1/6] srcu: Move rcu_is_watching() checks to srcu_read_{,un}lock_fast() Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace() Paul E. McKenney
@ 2025-07-23 20:27 ` Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Joel Fernandes, Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
This adds the usual scoped_guard(srcu_fast_notrace, &my_srcu) and
guard(srcu_fast_notrace)(&my_srcu).
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
include/linux/srcu.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index ec3b8e27d6c5a..1dc677bc9abca 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -516,4 +516,9 @@ DEFINE_LOCK_GUARD_1(srcu_fast, struct srcu_struct,
srcu_read_unlock_fast(_T->lock, _T->scp),
struct srcu_ctr __percpu *scp)
+DEFINE_LOCK_GUARD_1(srcu_fast_notrace, struct srcu_struct,
+ _T->scp = srcu_read_lock_fast_notrace(_T->lock),
+ srcu_read_unlock_fast_notrace(_T->lock, _T->scp),
+ struct srcu_ctr __percpu *scp)
+
#endif
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
` (2 preceding siblings ...)
2025-07-23 20:27 ` [PATCH v4 3/6] srcu: Add guards for notrace variants of SRCU-fast readers Paul E. McKenney
@ 2025-07-23 20:27 ` Paul E. McKenney
2025-07-23 21:40 ` Mathieu Desnoyers
2025-07-23 20:27 ` [PATCH v4 5/6] srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers Paul E. McKenney
` (2 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
to protect invocation of __DO_TRACE_CALL() means that BPF programs
attached to tracepoints are non-preemptible. This is unhelpful in
real-time systems, whose users apparently wish to use BPF while also
achieving low latencies. (Who knew?)
One option would be to use preemptible RCU, but this introduces
many opportunities for infinite recursion, which many consider to
be counterproductive, especially given the relatively small stacks
provided by the Linux kernel. These opportunities could be shut down
by sufficiently energetic duplication of code, but this sort of thing
is considered impolite in some circles.
Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
readers than those of preemptible RCU, at least on my laptop, where
task_struct access is more expensive than access to per-CPU variables.
And SRCU fast provides way faster readers than does SRCU, courtesy of
being able to avoid the read-side use of smp_mb(). Also, it is quite
straightforward to create srcu_read_{,un}lock_fast_notrace() functions.
While in the area, SRCU now supports early boot call_srcu(). Therefore,
remove the checks that used to avoid such use from rcu_free_old_probes()
before this commit was applied:
e53244e2c893 ("tracepoint: Remove SRCU protection")
The current commit can be thought of as an approximate revert of that
commit.
Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
include/linux/tracepoint.h | 6 ++++--
kernel/tracepoint.c | 21 ++++++++++++++++++++-
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 826ce3f8e1f85..a22c1ab88560b 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -33,6 +33,8 @@ struct trace_eval_map {
#define TRACEPOINT_DEFAULT_PRIO 10
+extern struct srcu_struct tracepoint_srcu;
+
extern int
tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
extern int
@@ -115,7 +117,7 @@ void for_each_tracepoint_in_module(struct module *mod,
static inline void tracepoint_synchronize_unregister(void)
{
synchronize_rcu_tasks_trace();
- synchronize_rcu();
+ synchronize_srcu(&tracepoint_srcu);
}
static inline bool tracepoint_is_faultable(struct tracepoint *tp)
{
@@ -271,7 +273,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
static inline void __do_trace_##name(proto) \
{ \
if (cond) { \
- guard(preempt_notrace)(); \
+ guard(srcu_fast_notrace)(&tracepoint_srcu); \
__DO_TRACE_CALL(name, TP_ARGS(args)); \
} \
} \
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 62719d2941c90..e19973015cbd7 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -25,6 +25,9 @@ enum tp_func_state {
extern tracepoint_ptr_t __start___tracepoints_ptrs[];
extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
enum tp_transition_sync {
TP_TRANSITION_SYNC_1_0_1,
TP_TRANSITION_SYNC_N_2_1,
@@ -34,6 +37,7 @@ enum tp_transition_sync {
struct tp_transition_snapshot {
unsigned long rcu;
+ unsigned long srcu_gp;
bool ongoing;
};
@@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync)
/* Keep the latest get_state snapshot. */
snapshot->rcu = get_state_synchronize_rcu();
+ snapshot->srcu_gp = start_poll_synchronize_srcu(&tracepoint_srcu);
snapshot->ongoing = true;
}
@@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync)
if (!snapshot->ongoing)
return;
cond_synchronize_rcu(snapshot->rcu);
+ if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp))
+ synchronize_srcu(&tracepoint_srcu);
snapshot->ongoing = false;
}
@@ -101,17 +108,29 @@ static inline void *allocate_probes(int count)
return p == NULL ? NULL : p->probes;
}
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
{
kfree(container_of(head, struct tp_probes, rcu));
}
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+ call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
{
if (old) {
struct tp_probes *tp_probes = container_of(old,
struct tp_probes, probes[0]);
+ /*
+ * Tracepoint probes are protected by either RCU or
+ * Tasks Trace RCU and also by SRCU. By calling the SRCU
+ * callback in the [Tasks Trace] RCU callback we cover
+ * both cases. So let us chain the SRCU and [Tasks Trace]
+ * RCU callbacks to wait for both grace periods.
+ */
if (tracepoint_is_faultable(tp))
call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
else
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 5/6] srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
` (3 preceding siblings ...)
2025-07-23 20:27 ` [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
@ 2025-07-23 20:27 ` Paul E. McKenney
2025-07-23 20:28 ` [PATCH v4 6/6] srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast Paul E. McKenney
2025-07-23 20:34 ` [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Steven Rostedt
6 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:27 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
This commit documents the implicit RCU readers that are implied by the
this_cpu_inc() and atomic_long_inc() operations in __srcu_read_lock_fast()
and __srcu_read_unlock_fast(). While in the area, fix the documentation
of the memory pairing of atomic_long_inc() in __srcu_read_lock_fast().
[ paulmck: Apply Joel Fernandes feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
include/linux/srcutree.h | 42 ++++++++++++++++++++++++++--------------
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 4d2fee4d38289..42098e0fa0b7d 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -232,9 +232,27 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
* srcu_read_unlock_fast().
*
* Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
- * critical sections either because they disables interrupts, because they
- * are a single instruction, or because they are a read-modify-write atomic
- * operation, depending on the whims of the architecture.
+ * critical sections either because they disables interrupts, because
+ * they are a single instruction, or because they are read-modify-write
+ * atomic operations, depending on the whims of the architecture.
+ * This matters because the SRCU-fast grace-period mechanism uses either
+ * synchronize_rcu() or synchronize_rcu_expedited(), that is, RCU,
+ * *not* SRCU, in order to eliminate the need for the read-side smp_mb()
+ * invocations that are used by srcu_read_lock() and srcu_read_unlock().
+ * The __srcu_read_unlock_fast() function also relies on this same RCU
+ * (again, *not* SRCU) trick to eliminate the need for smp_mb().
+ *
+ * The key point behind this RCU trick is that if any part of a given
+ * RCU reader precedes the beginning of a given RCU grace period, then
+ * the entirety of that RCU reader and everything preceding it happens
+ * before the end of that same RCU grace period. Similarly, if any part
+ * of a given RCU reader follows the end of a given RCU grace period,
+ * then the entirety of that RCU reader and everything following it
+ * happens after the beginning of that same RCU grace period. Therefore,
+ * the operations labeled Y in __srcu_read_lock_fast() and those labeled Z
+ * in __srcu_read_unlock_fast() are ordered against the corresponding SRCU
+ * read-side critical section from the viewpoint of the SRCU grace period.
+ * This is all the ordering that is required, hence no calls to smp_mb().
*
* This means that __srcu_read_lock_fast() is not all that fast
* on architectures that support NMIs but do not supply NMI-safe
@@ -245,9 +263,9 @@ static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct src
struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
- this_cpu_inc(scp->srcu_locks.counter); /* Y */
+ this_cpu_inc(scp->srcu_locks.counter); // Y, and implicit RCU reader.
else
- atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); /* Z */
+ atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); // Y, and implicit RCU reader.
barrier(); /* Avoid leaking the critical section. */
return scp;
}
@@ -258,23 +276,17 @@ static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct src
* different CPU than that which was incremented by the corresponding
* srcu_read_lock_fast(), but it must be within the same task.
*
- * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
- * critical sections either because they disables interrupts, because they
- * are a single instruction, or because they are a read-modify-write atomic
- * operation, depending on the whims of the architecture.
- *
- * This means that __srcu_read_unlock_fast() is not all that fast
- * on architectures that support NMIs but do not supply NMI-safe
- * implementations of this_cpu_inc().
+ * Please see the __srcu_read_lock_fast() function's header comment for
+ * information on implicit RCU readers and NMI safety.
*/
static inline void notrace
__srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
{
barrier(); /* Avoid leaking the critical section. */
if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
- this_cpu_inc(scp->srcu_unlocks.counter); /* Z */
+ this_cpu_inc(scp->srcu_unlocks.counter); // Z, and implicit RCU reader.
else
- atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); /* Z */
+ atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); // Z, and implicit RCU reader.
}
void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor);
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 6/6] srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
` (4 preceding siblings ...)
2025-07-23 20:27 ` [PATCH v4 5/6] srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers Paul E. McKenney
@ 2025-07-23 20:28 ` Paul E. McKenney
2025-07-23 20:34 ` [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Steven Rostedt
6 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:28 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
The smp_mb() memory barrier at the end of srcu_flip() has a comment,
but that comment does not make it clear that this memory barrier is an
optimization, as opposed to being needed for correctness. This commit
therefore adds this information and points out that it is omitted
for SRCU-fast, where a much heavier weight synchronize_srcu() would
be required.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <bpf@vger.kernel.org>
---
kernel/rcu/srcutree.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index c5e8ebc493d5e..1ff94b76d91f1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1168,6 +1168,16 @@ static void srcu_flip(struct srcu_struct *ssp)
* counter update. Note that both this memory barrier and the
* one in srcu_readers_active_idx_check() provide the guarantee
* for __srcu_read_lock().
+ *
+ * Note that this is a performance optimization, in which we spend
+ * an otherwise unnecessary smp_mb() in order to reduce the number
+ * of full per-CPU-variable scans in srcu_readers_lock_idx() and
+ * srcu_readers_unlock_idx(). But this performance optimization
+ * is not so optimal for SRCU-fast, where we would be spending
+ * not smp_mb(), but rather synchronize_rcu(). At the same time,
+ * the overhead of the smp_mb() is in the noise, so there is no
+ * point in omitting it in the SRCU-fast case. So the same code
+ * is executed either way.
*/
smp_mb(); /* D */ /* Pairs with C. */
}
--
2.40.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
` (5 preceding siblings ...)
2025-07-23 20:28 ` [PATCH v4 6/6] srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast Paul E. McKenney
@ 2025-07-23 20:34 ` Steven Rostedt
2025-07-23 20:54 ` Paul E. McKenney
6 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2025-07-23 20:34 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: rcu, linux-kernel, kernel-team
On Wed, 23 Jul 2025 13:27:54 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> This triggers continues to trigger a kernel test robot report of a
> "using smp_processor_id() in preemptible" splat. I looked for issues
> with explicit preemption disabling, and, not finding any, will next turn
> my attention to accesses to per-CPU variables. Any and all insights
> are welcome.
Currently perf and ftrace expect the tracepoints to be called with
preemption disabled. You may need this:
diff --git a/include/trace/perf.h b/include/trace/perf.h
index a1754b73a8f5..1b7925a85966 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -71,7 +71,9 @@ perf_trace_##call(void *__data, proto) \
u64 __count __attribute__((unused)); \
struct task_struct *__task __attribute__((unused)); \
\
+ preempt_disable_notrace(); \
do_perf_trace_##call(__data, args); \
+ preempt_enable_notrace(); \
}
#undef DECLARE_EVENT_SYSCALL_CLASS
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 4f22136fd465..0504a423ca25 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -436,7 +436,9 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
static notrace void \
trace_event_raw_event_##call(void *__data, proto) \
{ \
+ preempt_disable_notrace(); \
do_trace_event_raw_event_##call(__data, args); \
+ preempt_enable_notrace(); \
}
#undef DECLARE_EVENT_SYSCALL_CLASS
But please add it with the change, as there's "preempt_count" accounting to
report to the user that accounts that preemption was disabled when called.
-- Steve
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace()
2025-07-23 20:27 ` [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace() Paul E. McKenney
@ 2025-07-23 20:50 ` Boqun Feng
2025-07-23 21:21 ` Paul E. McKenney
0 siblings, 1 reply; 15+ messages in thread
From: Boqun Feng @ 2025-07-23 20:50 UTC (permalink / raw)
To: Paul E. McKenney
Cc: rcu, linux-kernel, kernel-team, rostedt, Joel Fernandes,
Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
On Wed, Jul 23, 2025 at 01:27:56PM -0700, Paul E. McKenney wrote:
> This commit adds no-trace variants of the srcu_read_lock_fast() and
> srcu_read_unlock_fast() functions for tracing use.
>
> [ paulmck: Apply notrace feedback from Joel Fernandes, Steven Rostedt, and Mathieu Desnoyers. ]
>
> Link: https://lore.kernel.org/all/20250721162433.10454-1-paulmck@kernel.org
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: <bpf@vger.kernel.org>
> ---
> include/linux/srcu.h | 30 ++++++++++++++++++++++++++++--
> include/linux/srcutree.h | 5 +++--
> 2 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 478c73d067f7d..ec3b8e27d6c5a 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -271,7 +271,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> * where RCU is watching, that is, from contexts where it would be legal
> * to invoke rcu_read_lock(). Otherwise, lockdep will complain.
> */
> -static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
> +static inline struct srcu_ctr __percpu notrace *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
Hmm.. am I missing something, why do we need ot make
srcu_read_lock_fast() notrace? I thought we only need those _notrace()
variants notrace?
Regards,
Boqun
> {
> struct srcu_ctr __percpu *retval;
>
> @@ -282,6 +282,20 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
> return retval;
> }
>
> +/*
> + * Used by tracing, cannot be traced and cannot call lockdep.
> + * See srcu_read_lock_fast() for more information.
> + */
> +static inline struct srcu_ctr __percpu *srcu_read_lock_fast_notrace(struct srcu_struct *ssp)
> + __acquires(ssp)
> +{
> + struct srcu_ctr __percpu *retval;
> +
> + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
> + retval = __srcu_read_lock_fast(ssp);
> + return retval;
> +}
> +
> /**
> * srcu_down_read_fast - register a new reader for an SRCU-protected structure.
> * @ssp: srcu_struct in which to register the new reader.
> @@ -385,7 +399,8 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> *
> * Exit a light-weight SRCU read-side critical section.
> */
> -static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> +static inline void notrace
> +srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> __releases(ssp)
> {
> srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
> @@ -394,6 +409,17 @@ static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ct
> RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
> }
>
> +/*
> + * Used by tracing, cannot be traced and cannot call lockdep.
> + * See srcu_read_unlock_fast() for more information.
> + */
> +static inline void srcu_read_unlock_fast_notrace(struct srcu_struct *ssp,
> + struct srcu_ctr __percpu *scp) __releases(ssp)
> +{
> + srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
> + __srcu_read_unlock_fast(ssp, scp);
> +}
> +
> /**
> * srcu_up_read_fast - unregister a old reader from an SRCU-protected structure.
> * @ssp: srcu_struct in which to unregister the old reader.
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 043b5a67ef71e..4d2fee4d38289 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -240,7 +240,7 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
> * on architectures that support NMIs but do not supply NMI-safe
> * implementations of this_cpu_inc().
> */
> -static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp)
> +static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct srcu_struct *ssp)
> {
> struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
>
> @@ -267,7 +267,8 @@ static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct
> * on architectures that support NMIs but do not supply NMI-safe
> * implementations of this_cpu_inc().
> */
> -static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> +static inline void notrace
> +__srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> {
> barrier(); /* Avoid leaking the critical section. */
> if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
> --
> 2.40.1
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast
2025-07-23 20:34 ` [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Steven Rostedt
@ 2025-07-23 20:54 ` Paul E. McKenney
0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 20:54 UTC (permalink / raw)
To: Steven Rostedt; +Cc: rcu, linux-kernel, kernel-team
On Wed, Jul 23, 2025 at 04:34:50PM -0400, Steven Rostedt wrote:
> On Wed, 23 Jul 2025 13:27:54 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
>
> > This triggers continues to trigger a kernel test robot report of a
> > "using smp_processor_id() in preemptible" splat. I looked for issues
> > with explicit preemption disabling, and, not finding any, will next turn
> > my attention to accesses to per-CPU variables. Any and all insights
> > are welcome.
>
> Currently perf and ftrace expect the tracepoints to be called with
> preemption disabled. You may need this:
>
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index a1754b73a8f5..1b7925a85966 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -71,7 +71,9 @@ perf_trace_##call(void *__data, proto) \
> u64 __count __attribute__((unused)); \
> struct task_struct *__task __attribute__((unused)); \
> \
> + preempt_disable_notrace(); \
> do_perf_trace_##call(__data, args); \
> + preempt_enable_notrace(); \
> }
>
> #undef DECLARE_EVENT_SYSCALL_CLASS
> diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
> index 4f22136fd465..0504a423ca25 100644
> --- a/include/trace/trace_events.h
> +++ b/include/trace/trace_events.h
> @@ -436,7 +436,9 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
> static notrace void \
> trace_event_raw_event_##call(void *__data, proto) \
> { \
> + preempt_disable_notrace(); \
> do_trace_event_raw_event_##call(__data, args); \
> + preempt_enable_notrace(); \
> }
>
> #undef DECLARE_EVENT_SYSCALL_CLASS
>
>
> But please add it with the change, as there's "preempt_count" accounting to
> report to the user that accounts that preemption was disabled when called.
Thank you, Steve! I suspect that it would have taken me one good long
time to find that one, like maybe forever. ;-)
I am doing local testing, then will expose it to the kernel test robot,
and if all goes well, fold it in with attribution.
Thanx, Paul
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace()
2025-07-23 20:50 ` Boqun Feng
@ 2025-07-23 21:21 ` Paul E. McKenney
0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 21:21 UTC (permalink / raw)
To: Boqun Feng
Cc: rcu, linux-kernel, kernel-team, rostedt, Joel Fernandes,
Mathieu Desnoyers, Sebastian Andrzej Siewior, bpf
On Wed, Jul 23, 2025 at 01:50:35PM -0700, Boqun Feng wrote:
> On Wed, Jul 23, 2025 at 01:27:56PM -0700, Paul E. McKenney wrote:
> > This commit adds no-trace variants of the srcu_read_lock_fast() and
> > srcu_read_unlock_fast() functions for tracing use.
> >
> > [ paulmck: Apply notrace feedback from Joel Fernandes, Steven Rostedt, and Mathieu Desnoyers. ]
> >
> > Link: https://lore.kernel.org/all/20250721162433.10454-1-paulmck@kernel.org
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: <bpf@vger.kernel.org>
> > ---
> > include/linux/srcu.h | 30 ++++++++++++++++++++++++++++--
> > include/linux/srcutree.h | 5 +++--
> > 2 files changed, 31 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index 478c73d067f7d..ec3b8e27d6c5a 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -271,7 +271,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> > * where RCU is watching, that is, from contexts where it would be legal
> > * to invoke rcu_read_lock(). Otherwise, lockdep will complain.
> > */
> > -static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
> > +static inline struct srcu_ctr __percpu notrace *srcu_read_lock_fast(struct srcu_struct *ssp) __acquires(ssp)
>
> Hmm.. am I missing something, why do we need ot make
> srcu_read_lock_fast() notrace? I thought we only need those _notrace()
> variants notrace?
Good point! It looks like I got a bit too exuberant here. I am removing
notrace from srcu_read_lock_fast() and srcu_read_unlock_fast(), but
leaving srcu_read_lock_fast_notrace(), srcu_read_unlock_fast_notrace(),
__srcu_read_lock_fast(), and __srcu_read_unlock_fast() as-is.
Thanx, Paul
> Regards,
> Boqun
>
> > {
> > struct srcu_ctr __percpu *retval;
> >
> > @@ -282,6 +282,20 @@ static inline struct srcu_ctr __percpu *srcu_read_lock_fast(struct srcu_struct *
> > return retval;
> > }
> >
> > +/*
> > + * Used by tracing, cannot be traced and cannot call lockdep.
> > + * See srcu_read_lock_fast() for more information.
> > + */
> > +static inline struct srcu_ctr __percpu *srcu_read_lock_fast_notrace(struct srcu_struct *ssp)
> > + __acquires(ssp)
> > +{
> > + struct srcu_ctr __percpu *retval;
> > +
> > + srcu_check_read_flavor_force(ssp, SRCU_READ_FLAVOR_FAST);
> > + retval = __srcu_read_lock_fast(ssp);
> > + return retval;
> > +}
> > +
> > /**
> > * srcu_down_read_fast - register a new reader for an SRCU-protected structure.
> > * @ssp: srcu_struct in which to register the new reader.
> > @@ -385,7 +399,8 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > *
> > * Exit a light-weight SRCU read-side critical section.
> > */
> > -static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> > +static inline void notrace
> > +srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> > __releases(ssp)
> > {
> > srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
> > @@ -394,6 +409,17 @@ static inline void srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ct
> > RCU_LOCKDEP_WARN(!rcu_is_watching(), "RCU must be watching srcu_read_unlock_fast().");
> > }
> >
> > +/*
> > + * Used by tracing, cannot be traced and cannot call lockdep.
> > + * See srcu_read_unlock_fast() for more information.
> > + */
> > +static inline void srcu_read_unlock_fast_notrace(struct srcu_struct *ssp,
> > + struct srcu_ctr __percpu *scp) __releases(ssp)
> > +{
> > + srcu_check_read_flavor(ssp, SRCU_READ_FLAVOR_FAST);
> > + __srcu_read_unlock_fast(ssp, scp);
> > +}
> > +
> > /**
> > * srcu_up_read_fast - unregister a old reader from an SRCU-protected structure.
> > * @ssp: srcu_struct in which to unregister the old reader.
> > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> > index 043b5a67ef71e..4d2fee4d38289 100644
> > --- a/include/linux/srcutree.h
> > +++ b/include/linux/srcutree.h
> > @@ -240,7 +240,7 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
> > * on architectures that support NMIs but do not supply NMI-safe
> > * implementations of this_cpu_inc().
> > */
> > -static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct *ssp)
> > +static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast(struct srcu_struct *ssp)
> > {
> > struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
> >
> > @@ -267,7 +267,8 @@ static inline struct srcu_ctr __percpu *__srcu_read_lock_fast(struct srcu_struct
> > * on architectures that support NMIs but do not supply NMI-safe
> > * implementations of this_cpu_inc().
> > */
> > -static inline void __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> > +static inline void notrace
> > +__srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> > {
> > barrier(); /* Avoid leaking the critical section. */
> > if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
> > --
> > 2.40.1
> >
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
2025-07-23 20:27 ` [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
@ 2025-07-23 21:40 ` Mathieu Desnoyers
2025-07-23 22:17 ` Paul E. McKenney
0 siblings, 1 reply; 15+ messages in thread
From: Mathieu Desnoyers @ 2025-07-23 21:40 UTC (permalink / raw)
To: Paul E. McKenney, rcu
Cc: linux-kernel, kernel-team, rostedt, Sebastian Andrzej Siewior,
bpf
On 2025-07-23 16:27, Paul E. McKenney wrote:
> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> to protect invocation of __DO_TRACE_CALL() means that BPF programs
> attached to tracepoints are non-preemptible. This is unhelpful in
> real-time systems, whose users apparently wish to use BPF while also
> achieving low latencies. (Who knew?)
>
> One option would be to use preemptible RCU, but this introduces
> many opportunities for infinite recursion, which many consider to
> be counterproductive, especially given the relatively small stacks
> provided by the Linux kernel. These opportunities could be shut down
> by sufficiently energetic duplication of code, but this sort of thing
> is considered impolite in some circles.
>
> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
> readers than those of preemptible RCU, at least on my laptop, where
> task_struct access is more expensive than access to per-CPU variables.
> And SRCU fast provides way faster readers than does SRCU, courtesy of
> being able to avoid the read-side use of smp_mb(). Also, it is quite
> straightforward to create srcu_read_{,un}lock_fast_notrace() functions.
As-is this will break the tracer callbacks, because some tracers expect
the tracepoint callback to be called with preemption-off for various
reasons, including preventing migration.
We'd need to add preempt off guards in the tracer callbacks that require
it initially before doing this change.
I've done something similar for the syscall tracepoints when introducing
faultable syscall tracepoints:
4aadde89d81f tracing/bpf: disable preemption in syscall probe
65e7462a16ce tracing/perf: disable preemption in syscall probe
13d750c2c03e tracing/ftrace: disable preemption in syscall probe
Thanks,
Mathieu
>
> While in the area, SRCU now supports early boot call_srcu(). Therefore,
> remove the checks that used to avoid such use from rcu_free_old_probes()
> before this commit was applied:
>
> e53244e2c893 ("tracepoint: Remove SRCU protection")
>
> The current commit can be thought of as an approximate revert of that
> commit.
>
> Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: <bpf@vger.kernel.org>
> ---
> include/linux/tracepoint.h | 6 ++++--
> kernel/tracepoint.c | 21 ++++++++++++++++++++-
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 826ce3f8e1f85..a22c1ab88560b 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -33,6 +33,8 @@ struct trace_eval_map {
>
> #define TRACEPOINT_DEFAULT_PRIO 10
>
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -115,7 +117,7 @@ void for_each_tracepoint_in_module(struct module *mod,
> static inline void tracepoint_synchronize_unregister(void)
> {
> synchronize_rcu_tasks_trace();
> - synchronize_rcu();
> + synchronize_srcu(&tracepoint_srcu);
> }
> static inline bool tracepoint_is_faultable(struct tracepoint *tp)
> {
> @@ -271,7 +273,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> static inline void __do_trace_##name(proto) \
> { \
> if (cond) { \
> - guard(preempt_notrace)(); \
> + guard(srcu_fast_notrace)(&tracepoint_srcu); \
> __DO_TRACE_CALL(name, TP_ARGS(args)); \
> } \
> } \
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 62719d2941c90..e19973015cbd7 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -25,6 +25,9 @@ enum tp_func_state {
> extern tracepoint_ptr_t __start___tracepoints_ptrs[];
> extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
>
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> enum tp_transition_sync {
> TP_TRANSITION_SYNC_1_0_1,
> TP_TRANSITION_SYNC_N_2_1,
> @@ -34,6 +37,7 @@ enum tp_transition_sync {
>
> struct tp_transition_snapshot {
> unsigned long rcu;
> + unsigned long srcu_gp;
> bool ongoing;
> };
>
> @@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync)
>
> /* Keep the latest get_state snapshot. */
> snapshot->rcu = get_state_synchronize_rcu();
> + snapshot->srcu_gp = start_poll_synchronize_srcu(&tracepoint_srcu);
> snapshot->ongoing = true;
> }
>
> @@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync)
> if (!snapshot->ongoing)
> return;
> cond_synchronize_rcu(snapshot->rcu);
> + if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp))
> + synchronize_srcu(&tracepoint_srcu);
> snapshot->ongoing = false;
> }
>
> @@ -101,17 +108,29 @@ static inline void *allocate_probes(int count)
> return p == NULL ? NULL : p->probes;
> }
>
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> kfree(container_of(head, struct tp_probes, rcu));
> }
>
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> + call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
> {
> if (old) {
> struct tp_probes *tp_probes = container_of(old,
> struct tp_probes, probes[0]);
>
> + /*
> + * Tracepoint probes are protected by either RCU or
> + * Tasks Trace RCU and also by SRCU. By calling the SRCU
> + * callback in the [Tasks Trace] RCU callback we cover
> + * both cases. So let us chain the SRCU and [Tasks Trace]
> + * RCU callbacks to wait for both grace periods.
> + */
> if (tracepoint_is_faultable(tp))
> call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
> else
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
2025-07-23 21:40 ` Mathieu Desnoyers
@ 2025-07-23 22:17 ` Paul E. McKenney
2025-07-23 22:29 ` Steven Rostedt
0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 22:17 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: rcu, linux-kernel, kernel-team, rostedt,
Sebastian Andrzej Siewior, bpf
On Wed, Jul 23, 2025 at 05:40:20PM -0400, Mathieu Desnoyers wrote:
> On 2025-07-23 16:27, Paul E. McKenney wrote:
> > The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> > to protect invocation of __DO_TRACE_CALL() means that BPF programs
> > attached to tracepoints are non-preemptible. This is unhelpful in
> > real-time systems, whose users apparently wish to use BPF while also
> > achieving low latencies. (Who knew?)
> >
> > One option would be to use preemptible RCU, but this introduces
> > many opportunities for infinite recursion, which many consider to
> > be counterproductive, especially given the relatively small stacks
> > provided by the Linux kernel. These opportunities could be shut down
> > by sufficiently energetic duplication of code, but this sort of thing
> > is considered impolite in some circles.
> >
> > Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
> > readers than those of preemptible RCU, at least on my laptop, where
> > task_struct access is more expensive than access to per-CPU variables.
> > And SRCU fast provides way faster readers than does SRCU, courtesy of
> > being able to avoid the read-side use of smp_mb(). Also, it is quite
> > straightforward to create srcu_read_{,un}lock_fast_notrace() functions.
>
> As-is this will break the tracer callbacks, because some tracers expect
> the tracepoint callback to be called with preemption-off for various
> reasons, including preventing migration.
>
> We'd need to add preempt off guards in the tracer callbacks that require
> it initially before doing this change.
>
> I've done something similar for the syscall tracepoints when introducing
> faultable syscall tracepoints:
>
> 4aadde89d81f tracing/bpf: disable preemption in syscall probe
> 65e7462a16ce tracing/perf: disable preemption in syscall probe
> 13d750c2c03e tracing/ftrace: disable preemption in syscall probe
Thank you, Mathieu!
I believe that Steve provided me with the essentials for perf and ftrace,
but please check: f808f53d4e4f ("squash! tracing: Guard __DECLARE_TRACE()
use of __DO_TRACE_CALL() with SRCU-fast").
On BPF, given that a major point of this exercise is that BPF programs be
preemptible, if we do need additional disabling of preemption, presumably
there would also need to be countervailing enabling somewhere on the
BPF side?
Thanx, Paul
> Thanks,
>
> Mathieu
>
> >
> > While in the area, SRCU now supports early boot call_srcu(). Therefore,
> > remove the checks that used to avoid such use from rcu_free_old_probes()
> > before this commit was applied:
> >
> > e53244e2c893 ("tracepoint: Remove SRCU protection")
> >
> > The current commit can be thought of as an approximate revert of that
> > commit.
> >
> > Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: <bpf@vger.kernel.org>
> > ---
> > include/linux/tracepoint.h | 6 ++++--
> > kernel/tracepoint.c | 21 ++++++++++++++++++++-
> > 2 files changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> > index 826ce3f8e1f85..a22c1ab88560b 100644
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -33,6 +33,8 @@ struct trace_eval_map {
> > #define TRACEPOINT_DEFAULT_PRIO 10
> > +extern struct srcu_struct tracepoint_srcu;
> > +
> > extern int
> > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> > extern int
> > @@ -115,7 +117,7 @@ void for_each_tracepoint_in_module(struct module *mod,
> > static inline void tracepoint_synchronize_unregister(void)
> > {
> > synchronize_rcu_tasks_trace();
> > - synchronize_rcu();
> > + synchronize_srcu(&tracepoint_srcu);
> > }
> > static inline bool tracepoint_is_faultable(struct tracepoint *tp)
> > {
> > @@ -271,7 +273,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > static inline void __do_trace_##name(proto) \
> > { \
> > if (cond) { \
> > - guard(preempt_notrace)(); \
> > + guard(srcu_fast_notrace)(&tracepoint_srcu); \
> > __DO_TRACE_CALL(name, TP_ARGS(args)); \
> > } \
> > } \
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 62719d2941c90..e19973015cbd7 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -25,6 +25,9 @@ enum tp_func_state {
> > extern tracepoint_ptr_t __start___tracepoints_ptrs[];
> > extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
> > +DEFINE_SRCU(tracepoint_srcu);
> > +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> > +
> > enum tp_transition_sync {
> > TP_TRANSITION_SYNC_1_0_1,
> > TP_TRANSITION_SYNC_N_2_1,
> > @@ -34,6 +37,7 @@ enum tp_transition_sync {
> > struct tp_transition_snapshot {
> > unsigned long rcu;
> > + unsigned long srcu_gp;
> > bool ongoing;
> > };
> > @@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync)
> > /* Keep the latest get_state snapshot. */
> > snapshot->rcu = get_state_synchronize_rcu();
> > + snapshot->srcu_gp = start_poll_synchronize_srcu(&tracepoint_srcu);
> > snapshot->ongoing = true;
> > }
> > @@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync)
> > if (!snapshot->ongoing)
> > return;
> > cond_synchronize_rcu(snapshot->rcu);
> > + if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp))
> > + synchronize_srcu(&tracepoint_srcu);
> > snapshot->ongoing = false;
> > }
> > @@ -101,17 +108,29 @@ static inline void *allocate_probes(int count)
> > return p == NULL ? NULL : p->probes;
> > }
> > -static void rcu_free_old_probes(struct rcu_head *head)
> > +static void srcu_free_old_probes(struct rcu_head *head)
> > {
> > kfree(container_of(head, struct tp_probes, rcu));
> > }
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > + call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> > static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
> > {
> > if (old) {
> > struct tp_probes *tp_probes = container_of(old,
> > struct tp_probes, probes[0]);
> > + /*
> > + * Tracepoint probes are protected by either RCU or
> > + * Tasks Trace RCU and also by SRCU. By calling the SRCU
> > + * callback in the [Tasks Trace] RCU callback we cover
> > + * both cases. So let us chain the SRCU and [Tasks Trace]
> > + * RCU callbacks to wait for both grace periods.
> > + */
> > if (tracepoint_is_faultable(tp))
> > call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
> > else
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
2025-07-23 22:17 ` Paul E. McKenney
@ 2025-07-23 22:29 ` Steven Rostedt
2025-07-23 22:51 ` Paul E. McKenney
0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2025-07-23 22:29 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mathieu Desnoyers, rcu, linux-kernel, kernel-team,
Sebastian Andrzej Siewior, bpf
On Wed, 23 Jul 2025 15:17:52 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:
> I believe that Steve provided me with the essentials for perf and ftrace,
> but please check: f808f53d4e4f ("squash! tracing: Guard __DECLARE_TRACE()
> use of __DO_TRACE_CALL() with SRCU-fast").
Note, there's nothing in the ftrace side that requires preemption disabled,
but it assumes that it is, and adjusts the preempt_count that is recorded
in the trace event to accommodate it.
-- Steve
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast
2025-07-23 22:29 ` Steven Rostedt
@ 2025-07-23 22:51 ` Paul E. McKenney
0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2025-07-23 22:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, rcu, linux-kernel, kernel-team,
Sebastian Andrzej Siewior, bpf
On Wed, Jul 23, 2025 at 06:29:30PM -0400, Steven Rostedt wrote:
> On Wed, 23 Jul 2025 15:17:52 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
>
> > I believe that Steve provided me with the essentials for perf and ftrace,
> > but please check: f808f53d4e4f ("squash! tracing: Guard __DECLARE_TRACE()
> > use of __DO_TRACE_CALL() with SRCU-fast").
>
> Note, there's nothing in the ftrace side that requires preemption disabled,
> but it assumes that it is, and adjusts the preempt_count that is recorded
> in the trace event to accommodate it.
Ah, thank you for the clarification. I agree with your approach as being
a more localized change, with less chance of some forgotten invariant
biting us. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-07-23 22:51 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 20:27 [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 1/6] srcu: Move rcu_is_watching() checks to srcu_read_{,un}lock_fast() Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 2/6] srcu: Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace() Paul E. McKenney
2025-07-23 20:50 ` Boqun Feng
2025-07-23 21:21 ` Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 3/6] srcu: Add guards for notrace variants of SRCU-fast readers Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
2025-07-23 21:40 ` Mathieu Desnoyers
2025-07-23 22:17 ` Paul E. McKenney
2025-07-23 22:29 ` Steven Rostedt
2025-07-23 22:51 ` Paul E. McKenney
2025-07-23 20:27 ` [PATCH v4 5/6] srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers Paul E. McKenney
2025-07-23 20:28 ` [PATCH v4 6/6] srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast Paul E. McKenney
2025-07-23 20:34 ` [PATCH 0/6] Switch __DECLARE_TRACE() to new notrace variant of SRCU-fast Steven Rostedt
2025-07-23 20:54 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).