From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
rostedt@goodmis.org, "Paul E. McKenney" <paulmck@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
bpf@vger.kernel.org
Subject: [PATCH v4 5/9] rcu: Add noinstr-fast rcu_read_{,un}lock_tasks_trace() APIs
Date: Mon, 29 Dec 2025 11:11:00 -0800 [thread overview]
Message-ID: <20251229191104.693447-5-paulmck@kernel.org> (raw)
In-Reply-To: <b206b083-f611-43b6-993f-78ddbe315813@paulmck-laptop>
When expressing RCU Tasks Trace in terms of SRCU-fast, it was
necessary to keep a nesting count and per-CPU srcu_ctr structure
pointer in the task_struct structure, which is slow to access.
But an alternative is to instead make rcu_read_lock_tasks_trace() and
rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
semantics, avoiding the task_struct accesses.
When all callers have switched to the new API, the previous
rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
The rcu_read_{,un}lock_{,tasks_}trace() functions need to use smp_mb()
only if invoked where RCU is not watching, that is, from locations where
a call to rcu_is_watching() would return false. In architectures that
define the ARCH_WANTS_NO_INSTR Kconfig option, use of noinstr and friends
ensures that tracing happens only where RCU is watching, so those
architectures can dispense entirely with the read-side calls to smp_mb().
Other architectures include these read-side calls by default, but in many
installations there might be either larger than average tolerance for
risk, prohibition of removing tracing on a running system, or careful
review and approval of removing of tracing. Such installations can
build their kernels with CONFIG_TASKS_TRACE_RCU_NO_MB=y to avoid those
read-side calls to smp_mb(), thus accepting responsibility for run-time
removal of tracing from code regions that RCU is not watching.
Those wishing to disable read-side memory barriers for an entire
architecture can select this TASKS_TRACE_RCU_NO_MB Kconfig option,
hence the polarity.
[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <bpf@vger.kernel.org>
---
include/linux/rcupdate_trace.h | 65 +++++++++++++++++++++++++++++-----
kernel/rcu/Kconfig | 23 ++++++++++++
2 files changed, 80 insertions(+), 8 deletions(-)
diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h
index 0bd47f12ecd17..f47ba9c074601 100644
--- a/include/linux/rcupdate_trace.h
+++ b/include/linux/rcupdate_trace.h
@@ -34,6 +34,53 @@ static inline int rcu_read_lock_trace_held(void)
#ifdef CONFIG_TASKS_TRACE_RCU
+/**
+ * rcu_read_lock_tasks_trace - mark beginning of RCU-trace read-side critical section
+ *
+ * When synchronize_rcu_tasks_trace() is invoked by one task, then that
+ * task is guaranteed to block until all other tasks exit their read-side
+ * critical sections. Similarly, if call_rcu_trace() is invoked on one
+ * task while other tasks are within RCU read-side critical sections,
+ * invocation of the corresponding RCU callback is deferred until after
+ * the all the other tasks exit their critical sections.
+ *
+ * For more details, please see the documentation for
+ * srcu_read_lock_fast(). For a description of how implicit RCU
+ * readers provide the needed ordering for architectures defining the
+ * ARCH_WANTS_NO_INSTR Kconfig option (and thus promising never to trace
+ * code where RCU is not watching), please see the __srcu_read_lock_fast()
+ * (non-kerneldoc) header comment. Otherwise, the smp_mb() below provided
+ * the needed ordering.
+ */
+static inline struct srcu_ctr __percpu *rcu_read_lock_tasks_trace(void)
+{
+ struct srcu_ctr __percpu *ret = __srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
+
+ rcu_try_lock_acquire(&rcu_tasks_trace_srcu_struct.dep_map);
+ if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_NO_MB))
+ smp_mb(); // Provide ordering on noinstr-incomplete architectures.
+ return ret;
+}
+
+/**
+ * rcu_read_unlock_tasks_trace - mark end of RCU-trace read-side critical section
+ * @scp: return value from corresponding rcu_read_lock_tasks_trace().
+ *
+ * Pairs with the preceding call to rcu_read_lock_tasks_trace() that
+ * returned the value passed in via scp.
+ *
+ * For more details, please see the documentation for rcu_read_unlock().
+ * For memory-ordering information, please see the header comment for the
+ * rcu_read_lock_tasks_trace() function.
+ */
+static inline void rcu_read_unlock_tasks_trace(struct srcu_ctr __percpu *scp)
+{
+ if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_NO_MB))
+ smp_mb(); // Provide ordering on noinstr-incomplete architectures.
+ __srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
+ srcu_lock_release(&rcu_tasks_trace_srcu_struct.dep_map);
+}
+
/**
* rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
*
@@ -50,14 +97,15 @@ static inline void rcu_read_lock_trace(void)
{
struct task_struct *t = current;
+ rcu_try_lock_acquire(&rcu_tasks_trace_srcu_struct.dep_map);
if (t->trc_reader_nesting++) {
// In case we interrupted a Tasks Trace RCU reader.
- rcu_try_lock_acquire(&rcu_tasks_trace_srcu_struct.dep_map);
return;
}
barrier(); // nesting before scp to protect against interrupt handler.
- t->trc_reader_scp = srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
- smp_mb(); // Placeholder for more selective ordering
+ t->trc_reader_scp = __srcu_read_lock_fast(&rcu_tasks_trace_srcu_struct);
+ if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_NO_MB))
+ smp_mb(); // Placeholder for more selective ordering
}
/**
@@ -74,13 +122,14 @@ static inline void rcu_read_unlock_trace(void)
struct srcu_ctr __percpu *scp;
struct task_struct *t = current;
- smp_mb(); // Placeholder for more selective ordering
scp = t->trc_reader_scp;
barrier(); // scp before nesting to protect against interrupt handler.
- if (!--t->trc_reader_nesting)
- srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
- else
- srcu_lock_release(&rcu_tasks_trace_srcu_struct.dep_map);
+ if (!--t->trc_reader_nesting) {
+ if (!IS_ENABLED(CONFIG_TASKS_TRACE_RCU_NO_MB))
+ smp_mb(); // Placeholder for more selective ordering
+ __srcu_read_unlock_fast(&rcu_tasks_trace_srcu_struct, scp);
+ }
+ srcu_lock_release(&rcu_tasks_trace_srcu_struct.dep_map);
}
/**
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index c381a31301168..762299291e09b 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -142,6 +142,29 @@ config TASKS_TRACE_RCU
default n
select IRQ_WORK
+config TASKS_TRACE_RCU_NO_MB
+ bool "Override RCU Tasks Trace inclusion of read-side memory barriers"
+ depends on RCU_EXPERT && TASKS_TRACE_RCU
+ default ARCH_WANTS_NO_INSTR
+ help
+ This option prevents the use of read-side memory barriers in
+ rcu_read_lock_tasks_trace() and rcu_read_unlock_tasks_trace()
+ even in kernels built with CONFIG_ARCH_WANTS_NO_INSTR=n, that is,
+ in kernels that do not have noinstr set up in entry/exit code.
+ By setting this option, you are promising to carefully review
+ use of ftrace, BPF, and friends to ensure that no tracing
+ operation is attached to a function that runs in that portion
+ of the entry/exit code that RCU does not watch, that is,
+ where rcu_is_watching() returns false. Alternatively, you
+ might choose to never remove traces except by rebooting.
+
+ Those wishing to disable read-side memory barriers for an entire
+ architecture can select this Kconfig option, hence the polarity.
+
+ Say Y here if you need speed and will review use of tracing.
+ Say N here for certain esoteric testing of RCU itself.
+ Take the default if you are unsure.
+
config RCU_STALL_COMMON
def_bool TREE_RCU
help
--
2.40.1
next prev parent reply other threads:[~2025-12-29 19:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-29 19:10 [PATCH v4 0/9] Implement RCU Tasks Trace in terms of SRCU for post-v4.19 Paul E. McKenney
2025-12-29 19:10 ` [PATCH v4 1/9] rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast Paul E. McKenney
2025-12-29 19:10 ` [PATCH v4 2/9] context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}() Paul E. McKenney
2025-12-29 19:10 ` [PATCH v4 3/9] rcu: Clean up after the SRCU-fastification of RCU Tasks Trace Paul E. McKenney
2025-12-29 19:10 ` [PATCH v4 4/9] rcu: Move rcu_tasks_trace_srcu_struct out of #ifdef CONFIG_TASKS_RCU_GENERIC Paul E. McKenney
2025-12-29 19:11 ` Paul E. McKenney [this message]
2025-12-29 19:11 ` [PATCH v4 6/9] rcu: Update Requirements.rst for RCU Tasks Trace Paul E. McKenney
2025-12-29 19:11 ` [PATCH v4 7/9] checkpatch: Deprecate rcu_read_{,un}lock_trace() Paul E. McKenney
2025-12-29 19:11 ` [PATCH v4 8/9] srcu: Create an rcu_tasks_trace_expedite_current() function Paul E. McKenney
2025-12-29 19:11 ` [PATCH v4 9/9] rcutorture: Test rcu_tasks_trace_expedite_current() Paul E. McKenney
2025-12-30 23:19 ` [PATCH v4 0/9] Implement RCU Tasks Trace in terms of SRCU for post-v4.19 Joel Fernandes
2026-01-01 8:41 ` Boqun Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251229191104.693447-5-paulmck@kernel.org \
--to=paulmck@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox