public inbox for linux-rt-devel@lists.linux.dev
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: paulmck@kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Boqun Feng <boqun.feng@gmail.com>,
	linux-rt-devel@lists.linux.dev, rcu@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	Frederic Weisbecker <frederic@kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Uladzislau Rezki <urezki@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace()
Date: Thu, 17 Jul 2025 15:36:46 -0400	[thread overview]
Message-ID: <5dc49f5a-ddda-422b-a8af-c662ee53d503@efficios.com> (raw)
In-Reply-To: <2f8bb8bb-320e-480f-9a56-8eb5cbd4438a@paulmck-laptop>

On 2025-07-17 11:18, Paul E. McKenney wrote:
> On Thu, Jul 17, 2025 at 10:46:46AM -0400, Mathieu Desnoyers wrote:
>> On 2025-07-17 09:14, Mathieu Desnoyers wrote:
>>> On 2025-07-16 18:54, Paul E. McKenney wrote:
>> [...]
>>>
>>> 2) I think I'm late to the party in reviewing srcu-fast, I'll
>>>      go have a look :)
>>
>> OK, I'll bite. :) Please let me know where I'm missing something:
>>
>> Looking at srcu-lite and srcu-fast, I understand that they fundamentally
>> depend on a trick we published here https://lwn.net/Articles/573497/
>> "The RCU-barrier menagerie" that allows turning, e.g. this Dekker:
>>
>> volatile int x = 0, y = 0
>>
>> CPU 0              CPU 1
>>
>> x = 1              y = 1
>> smp_mb             smp_mb
>> r2 = y             r4 = x
>>
>> BUG_ON(r2 == 0 && r4 == 0)
>>
>> into
>>
>> volatile int x = 0, y = 0
>>
>> CPU 0            CPU 1
>>
>> rcu_read_lock()
>> x = 1              y = 1
>>                     synchronize_rcu()
>> r2 = y             r4 = x
>> rcu_read_unlock()
>>
>> BUG_ON(r2 == 0 && r4 == 0)
>>
>> So looking at srcu-fast, we have:
>>
>>   * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
>>   * critical sections either because they disables interrupts, because they
>>   * are a single instruction, or because they are a read-modify-write atomic
>>   * operation, depending on the whims of the architecture.
>>
>> It appears to be pairing, as RCU read-side:
>>
>> - irq off/on implied by this_cpu_inc
>> - atomic
>> - single instruction
>>
>> with synchronize_rcu within the grace period, and hope that this behaves as a
>> smp_mb pairing preventing the srcu read-side critical section from leaking
>> out of the srcu read lock/unlock.
>>
>> I note that there is a validation that rcu_is_watching() within
>> __srcu_read_lock_fast, but it's one thing to have rcu watching, but
>> another to have an actual read-side critical section. Note that
>> preemption, irqs, softirqs can very well be enabled when calling
>> __srcu_read_lock_fast.
>>
>> My understanding of the how memory barriers implemented with RCU
>> work is that we need to surround the memory accesses on the fast-path
>> (where we turn smp_mb into barrier) with an RCU read-side critical
>> section to make sure it does not spawn across a synchronize_rcu.
>>
>> What I am missing here is how can a RCU side-side that only consist
>> of the irq off/on or atomic or single instruction cover all memory
>> accesses we are trying to order, namely those within the srcu
>> critical section after the compiler barrier() ? Is having RCU
>> watching sufficient to guarantee this ?
> 
> Good eyes!!!
> 
> The trick is that this "RCU read-side critical section" consists only of
> either this_cpu_inc() or atomic_long_inc(), with the latter only happening
> in systems that have NMIs, but don't have NMI-safe per-CPU operations.
> Neither this_cpu_inc() nor atomic_long_inc() can be interrupted, and
> thus both act as an interrupts-disabled RCU read-side critical section.
> 
> Therefore, if the SRCU grace-period computation fails to see an
> srcu_read_lock_fast() increment, its earlier code is guaranteed to
> happen before the corresponding critical section.  Similarly, if the SRCU
> grace-period computation sees an srcu_read_unlock_fast(), its subsequent
> code is guaranteed to happen after the corresponding critical section.
> 
> Does that help?  If so, would you be interested and nominating a comment?
> 
> Or am I missing something subtle here?

Here is the root of my concern: considering a single instruction
as an RCU-barrier "read-side" for a classic Dekker would not work,
because the read-side would not cover both memory accesses that need
to be ordered.

I cannot help but notice the similarity between this pattern of
barrier vs synchronize_rcu and what we allow userspace to do with
barrier vs sys_membarrier, which has one implementation
based on synchronize_rcu (except for TICK_NOHZ_FULL). Originally
when membarrier was introduced, this was based on synchronize_sched(),
and I recall that this was OK because userspace execution acted as
a read-side critical section from the perspective of synchronize_sched().
As commented in kernel v4.10 near synchronize_sched():

  * Note that this guarantee implies further memory-ordering guarantees.
  * On systems with more than one CPU, when synchronize_sched() returns,
  * each CPU is guaranteed to have executed a full memory barrier since the
  * end of its last RCU-sched read-side critical section whose beginning
  * preceded the call to synchronize_sched().  In addition, each CPU having
  * an RCU read-side critical section that extends beyond the return from
  * synchronize_sched() is guaranteed to have executed a full memory barrier
  * after the beginning of synchronize_sched() and before the beginning of
  * that RCU read-side critical section.  Note that these guarantees include
  * CPUs that are offline, idle, or executing in user mode, as well as CPUs
  * that are executing in the kernel.

So even though I see how synchronize_rcu() nowadays is still a good
choice to implement sys_membarrier, it only apply to RCU read side
critical sections, which covers userspace code and the specific
read-side critical sections in the kernel.

But what I don't get is how synchronize_rcu() can help us promote
the barrier() in SRCU-fast to smp_mb when outside of any RCU read-side
critical section tracked by the synchronize_rcu grace period,
mainly because unlike the sys_membarrier scenario, this is *not*
userspace code.

And what we want to order here on the read-side is the lock/unlock
increments vs the memory accesses within the critical section, but
there is no RCU read-side that contain all those memory accesses
that match those synchronize_rcu calls, so the promotion from barrier
to smp_mb don't appear to be valid.

But perhaps there is something more that is specific to the SRCU
algorithm that I missing here ?

Thanks,

Mathieu

> 
> Either way, many thanks for digging into this!!!
> 
> 							Thanx, Paul
> 
>> Thanks,
>>
>> Mathieu
>>
>> -- 
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

  reply	other threads:[~2025-07-17 19:36 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-13 15:22 [RFC PATCH 0/2] Switch tracing from sched-RCU to preempt-RCU Sebastian Andrzej Siewior
2025-06-13 15:22 ` [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace() Sebastian Andrzej Siewior
2025-06-18 17:21   ` Boqun Feng
2025-06-20  8:43     ` Sebastian Andrzej Siewior
2025-06-20 11:23       ` Paul E. McKenney
2025-06-23 10:49         ` Sebastian Andrzej Siewior
2025-06-23 18:13           ` Paul E. McKenney
2025-07-07 21:56             ` Paul E. McKenney
2025-07-08 19:40               ` Mathieu Desnoyers
2025-07-08 20:49                 ` Paul E. McKenney
2025-07-09 14:31                   ` Mathieu Desnoyers
2025-07-09 18:33                     ` Paul E. McKenney
2025-07-11 13:46                       ` Mathieu Desnoyers
2025-07-11 17:05                         ` Paul E. McKenney
2025-07-14 16:34                           ` Paul E. McKenney
2025-07-15 19:56                             ` Mathieu Desnoyers
2025-07-15 23:23                               ` Paul E. McKenney
2025-07-15 19:54                           ` Mathieu Desnoyers
2025-07-15 23:18                             ` Paul E. McKenney
2025-07-16  0:42                               ` Paul E. McKenney
2025-07-16  4:41                                 ` Paul E. McKenney
2025-07-16 15:09                           ` Steven Rostedt
2025-07-16 20:35                             ` Paul E. McKenney
2025-07-16 22:54                               ` Paul E. McKenney
2025-07-17 13:14                                 ` Mathieu Desnoyers
2025-07-17 14:46                                   ` Mathieu Desnoyers
2025-07-17 15:18                                     ` Paul E. McKenney
2025-07-17 19:36                                       ` Mathieu Desnoyers [this message]
2025-07-17 21:27                                         ` Paul E. McKenney
2026-01-07  1:26                                         ` Joel Fernandes
2026-01-06 15:08                                       ` Mathieu Desnoyers
2026-01-06 18:30                                         ` Paul E. McKenney
2026-01-06 18:43                                           ` Mathieu Desnoyers
2025-07-17 14:57                                   ` Alexei Starovoitov
2025-07-17 15:12                                     ` Steven Rostedt
2025-07-17 15:27                                       ` Alexei Starovoitov
2025-07-17 15:40                                         ` Steven Rostedt
2025-07-17 15:55                                           ` Steven Rostedt
2025-07-17 16:02                                             ` Alexei Starovoitov
2025-07-17 16:19                                               ` Steven Rostedt
2025-07-17 17:38                                               ` Mathieu Desnoyers
2025-07-17 16:04                                             ` Paul E. McKenney
2025-07-17 15:44                                         ` Paul E. McKenney
2025-07-17 15:30                                     ` Paul E. McKenney
2025-07-17 15:07                                   ` Paul E. McKenney
2025-07-17 19:04                                 ` [PATCH RFC 6/4] srcu: Add guards for SRCU-fast readers Paul E. McKenney
2025-07-17 19:19                                   ` Steven Rostedt
2025-07-17 19:51                                     ` Paul E. McKenney
2025-07-17 19:56                                       ` Steven Rostedt
2025-07-17 20:38                                         ` Paul E. McKenney
2025-07-19  0:28                                 ` [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace() Paul E. McKenney
2025-06-13 15:22 ` [RFC PATCH 2/2] trace: Use rcu_read_lock() instead preempt_disable() Sebastian Andrzej Siewior
2025-06-13 15:38 ` [RFC PATCH 0/2] Switch tracing from sched-RCU to preempt-RCU Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dc49f5a-ddda-422b-a8af-c662ee53d503@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox