linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
       [not found] <bb177afd-eea8-4a2a-9600-e36ada26a500@paulmck-laptop>
@ 2025-11-05 20:32 ` Paul E. McKenney
  2025-11-08 13:07   ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2025-11-05 20:32 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Catalin Marinas, Will Deacon, Mark Rutland, Mathieu Desnoyers,
	Sebastian Andrzej Siewior, linux-arm-kernel, bpf

Some arm64 platforms have slow per-CPU atomic operations, for example,
the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
atomic operations to interrupt-disabled non-read-modify-write-atomic
atomic_read()/atomic_set() operations.  This works because
SRCU-fast-updown is not invoked from read-side primitives, which
means that if srcu_read_unlock_fast() NMI handlers.  This means that
srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
exclude themselves and each other

This reduces the overhead of calls to srcu_read_lock_fast_updown() and
srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
it sure beats 100ns.

This command was used to measure the overhead:

tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: <bpf@vger.kernel.org>
---
 include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index d6f978b50472..0e06f87e1d7c 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -253,6 +253,34 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_ptr(struct srcu_struct *ss
 	return &ssp->sda->srcu_ctrs[idx];
 }
 
+/*
+ * Non-atomic manipulation of SRCU lock counters.
+ */
+static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast_na(struct srcu_struct *ssp)
+{
+	atomic_long_t *scnp;
+	struct srcu_ctr __percpu *scp;
+
+	lockdep_assert_preemption_disabled();
+	scp = READ_ONCE(ssp->srcu_ctrp);
+	scnp = raw_cpu_ptr(&scp->srcu_locks);
+	atomic_long_set(scnp, atomic_long_read(scnp) + 1);
+	return scp;
+}
+
+/*
+ * Non-atomic manipulation of SRCU unlock counters.
+ */
+static inline void notrace
+__srcu_read_unlock_fast_na(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
+{
+	atomic_long_t *scnp;
+
+	lockdep_assert_preemption_disabled();
+	scnp = raw_cpu_ptr(&scp->srcu_unlocks);
+	atomic_long_set(scnp, atomic_long_read(scnp) + 1);
+}
+
 /*
  * Counts the new reader in the appropriate per-CPU element of the
  * srcu_struct.  Returns a pointer that must be passed to the matching
@@ -327,8 +355,18 @@ __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
 static inline
 struct srcu_ctr __percpu notrace *__srcu_read_lock_fast_updown(struct srcu_struct *ssp)
 {
-	struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
+	struct srcu_ctr __percpu *scp;
 
+	if (IS_ENABLED(CONFIG_ARM64) && IS_ENABLED(CONFIG_ARM64_USE_LSE_PERCPU_ATOMICS)) {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		scp = __srcu_read_lock_fast_na(ssp);
+		local_irq_restore(flags); /* Avoids leaking the critical section. */
+		return scp;
+	}
+
+	scp = READ_ONCE(ssp->srcu_ctrp);
 	if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
 		this_cpu_inc(scp->srcu_locks.counter); // Y, and implicit RCU reader.
 	else
@@ -350,10 +388,17 @@ static inline void notrace
 __srcu_read_unlock_fast_updown(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
 {
 	barrier();  /* Avoid leaking the critical section. */
-	if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
+	if (IS_ENABLED(CONFIG_ARM64)) {
+		unsigned long flags;
+
+		local_irq_save(flags);
+		 __srcu_read_unlock_fast_na(ssp, scp);
+		local_irq_restore(flags);
+	} else if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) {
 		this_cpu_inc(scp->srcu_unlocks.counter);  // Z, and implicit RCU reader.
-	else
+	} else {
 		atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks));  // Z, and implicit RCU reader.
+	}
 }
 
 void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-05 20:32 ` [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64 Paul E. McKenney
@ 2025-11-08 13:07   ` Will Deacon
  2025-11-08 18:38     ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2025-11-08 13:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf

Hi Paul,

On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> Some arm64 platforms have slow per-CPU atomic operations, for example,
> the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> atomic operations to interrupt-disabled non-read-modify-write-atomic
> atomic_read()/atomic_set() operations.  This works because
> SRCU-fast-updown is not invoked from read-side primitives, which
> means that if srcu_read_unlock_fast() NMI handlers.  This means that
> srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> exclude themselves and each other
> 
> This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> it sure beats 100ns.
> 
> This command was used to measure the overhead:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: <linux-arm-kernel@lists.infradead.org>
> Cc: <bpf@vger.kernel.org>
> ---
>  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 48 insertions(+), 3 deletions(-)

I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
for 6.18, so please can you drop this SRCU commit from your tree?

Cheers,

Will

[1] https://git.kernel.org/arm64/c/535fdfc5a228


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-08 13:07   ` Will Deacon
@ 2025-11-08 18:38     ` Paul E. McKenney
  2025-11-10 11:24       ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2025-11-08 18:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf, frederic

On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> Hi Paul,
> 
> On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > atomic_read()/atomic_set() operations.  This works because
> > SRCU-fast-updown is not invoked from read-side primitives, which
> > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > exclude themselves and each other
> > 
> > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > it sure beats 100ns.
> > 
> > This command was used to measure the overhead:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: <linux-arm-kernel@lists.infradead.org>
> > Cc: <bpf@vger.kernel.org>
> > ---
> >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 48 insertions(+), 3 deletions(-)
> 
> I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> for 6.18, so please can you drop this SRCU commit from your tree?

Very good!  Adding Frederic on CC since he is doing the pull request
for the upcoming merge window.

But if this doesn't show up in -rc1, we reserve the right to put it
back in.

Sorry, couldn't resist!   ;-)

							Thanx, Paul

> Cheers,
> 
> Will
> 
> [1] https://git.kernel.org/arm64/c/535fdfc5a228


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-08 18:38     ` Paul E. McKenney
@ 2025-11-10 11:24       ` Will Deacon
  2025-11-10 17:29         ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2025-11-10 11:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf, frederic

On Sat, Nov 08, 2025 at 10:38:32AM -0800, Paul E. McKenney wrote:
> On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> > On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > > atomic_read()/atomic_set() operations.  This works because
> > > SRCU-fast-updown is not invoked from read-side primitives, which
> > > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > > exclude themselves and each other
> > > 
> > > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > > it sure beats 100ns.
> > > 
> > > This command was used to measure the overhead:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > Cc: <linux-arm-kernel@lists.infradead.org>
> > > Cc: <bpf@vger.kernel.org>
> > > ---
> > >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 48 insertions(+), 3 deletions(-)
> > 
> > I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> > for 6.18, so please can you drop this SRCU commit from your tree?
> 
> Very good!  Adding Frederic on CC since he is doing the pull request
> for the upcoming merge window.
> 
> But if this doesn't show up in -rc1, we reserve the right to put it
> back in.
> 
> Sorry, couldn't resist!   ;-)

I've merged it as a fix, so hopefully it will show up in v6.18-rc6.

Will


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-10 11:24       ` Will Deacon
@ 2025-11-10 17:29         ` Paul E. McKenney
  2025-11-24 13:04           ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2025-11-10 17:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf, frederic

On Mon, Nov 10, 2025 at 11:24:07AM +0000, Will Deacon wrote:
> On Sat, Nov 08, 2025 at 10:38:32AM -0800, Paul E. McKenney wrote:
> > On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> > > On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > > > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > > > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > > > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > > > atomic_read()/atomic_set() operations.  This works because
> > > > SRCU-fast-updown is not invoked from read-side primitives, which
> > > > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > > > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > > > exclude themselves and each other
> > > > 
> > > > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > > > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > > > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > > > it sure beats 100ns.
> > > > 
> > > > This command was used to measure the overhead:
> > > > 
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > Cc: <linux-arm-kernel@lists.infradead.org>
> > > > Cc: <bpf@vger.kernel.org>
> > > > ---
> > > >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 48 insertions(+), 3 deletions(-)
> > > 
> > > I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> > > for 6.18, so please can you drop this SRCU commit from your tree?
> > 
> > Very good!  Adding Frederic on CC since he is doing the pull request
> > for the upcoming merge window.
> > 
> > But if this doesn't show up in -rc1, we reserve the right to put it
> > back in.
> > 
> > Sorry, couldn't resist!   ;-)
> 
> I've merged it as a fix, so hopefully it will show up in v6.18-rc6.

Even better, thank you!!!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-10 17:29         ` Paul E. McKenney
@ 2025-11-24 13:04           ` Will Deacon
  2025-11-24 17:20             ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2025-11-24 13:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf, frederic

On Mon, Nov 10, 2025 at 09:29:43AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 10, 2025 at 11:24:07AM +0000, Will Deacon wrote:
> > On Sat, Nov 08, 2025 at 10:38:32AM -0800, Paul E. McKenney wrote:
> > > On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> > > > On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > > > > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > > > > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > > > > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > > > > atomic_read()/atomic_set() operations.  This works because
> > > > > SRCU-fast-updown is not invoked from read-side primitives, which
> > > > > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > > > > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > > > > exclude themselves and each other
> > > > > 
> > > > > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > > > > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > > > > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > > > > it sure beats 100ns.
> > > > > 
> > > > > This command was used to measure the overhead:
> > > > > 
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > Cc: Will Deacon <will@kernel.org>
> > > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > Cc: <linux-arm-kernel@lists.infradead.org>
> > > > > Cc: <bpf@vger.kernel.org>
> > > > > ---
> > > > >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 48 insertions(+), 3 deletions(-)
> > > > 
> > > > I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> > > > for 6.18, so please can you drop this SRCU commit from your tree?
> > > 
> > > Very good!  Adding Frederic on CC since he is doing the pull request
> > > for the upcoming merge window.
> > > 
> > > But if this doesn't show up in -rc1, we reserve the right to put it
> > > back in.
> > > 
> > > Sorry, couldn't resist!   ;-)
> > 
> > I've merged it as a fix, so hopefully it will show up in v6.18-rc6.
> 
> Even better, thank you!!!

It landed in Linus' tree here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?id=535fdfc5a228524552ee8810c9175e877e127c27

Please can you drop the SRCU change from -next? It still shows up in
20251121.

Will


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-24 13:04           ` Will Deacon
@ 2025-11-24 17:20             ` Paul E. McKenney
  2025-11-24 22:47               ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2025-11-24 17:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: rcu, linux-kernel, kernel-team, rostedt, Catalin Marinas,
	Mark Rutland, Mathieu Desnoyers, Sebastian Andrzej Siewior,
	linux-arm-kernel, bpf, frederic

On Mon, Nov 24, 2025 at 01:04:20PM +0000, Will Deacon wrote:
> On Mon, Nov 10, 2025 at 09:29:43AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 10, 2025 at 11:24:07AM +0000, Will Deacon wrote:
> > > On Sat, Nov 08, 2025 at 10:38:32AM -0800, Paul E. McKenney wrote:
> > > > On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> > > > > On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > > > > > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > > > > > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > > > > > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > > > > > atomic_read()/atomic_set() operations.  This works because
> > > > > > SRCU-fast-updown is not invoked from read-side primitives, which
> > > > > > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > > > > > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > > > > > exclude themselves and each other
> > > > > > 
> > > > > > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > > > > > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > > > > > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > > > > > it sure beats 100ns.
> > > > > > 
> > > > > > This command was used to measure the overhead:
> > > > > > 
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > > Cc: Will Deacon <will@kernel.org>
> > > > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > Cc: <linux-arm-kernel@lists.infradead.org>
> > > > > > Cc: <bpf@vger.kernel.org>
> > > > > > ---
> > > > > >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> > > > > >  1 file changed, 48 insertions(+), 3 deletions(-)
> > > > > 
> > > > > I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> > > > > for 6.18, so please can you drop this SRCU commit from your tree?
> > > > 
> > > > Very good!  Adding Frederic on CC since he is doing the pull request
> > > > for the upcoming merge window.
> > > > 
> > > > But if this doesn't show up in -rc1, we reserve the right to put it
> > > > back in.
> > > > 
> > > > Sorry, couldn't resist!   ;-)
> > > 
> > > I've merged it as a fix, so hopefully it will show up in v6.18-rc6.
> > 
> > Even better, thank you!!!
> 
> It landed in Linus' tree here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?id=535fdfc5a228524552ee8810c9175e877e127c27

Again, thank you, and Breno has started backporting it for use in
our fleet.

> Please can you drop the SRCU change from -next? It still shows up in
> 20251121.

This one?

11f748499236 ("srcu: Optimize SRCU-fast-updown for arm64")

if so, Frederic, could you please drop this commit?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-24 17:20             ` Paul E. McKenney
@ 2025-11-24 22:47               ` Frederic Weisbecker
  2025-11-25 11:40                 ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2025-11-24 22:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, rcu, linux-kernel, kernel-team, rostedt,
	Catalin Marinas, Mark Rutland, Mathieu Desnoyers,
	Sebastian Andrzej Siewior, linux-arm-kernel, bpf

Le Mon, Nov 24, 2025 at 09:20:25AM -0800, Paul E. McKenney a écrit :
> On Mon, Nov 24, 2025 at 01:04:20PM +0000, Will Deacon wrote:
> > On Mon, Nov 10, 2025 at 09:29:43AM -0800, Paul E. McKenney wrote:
> > > On Mon, Nov 10, 2025 at 11:24:07AM +0000, Will Deacon wrote:
> > > > On Sat, Nov 08, 2025 at 10:38:32AM -0800, Paul E. McKenney wrote:
> > > > > On Sat, Nov 08, 2025 at 01:07:45PM +0000, Will Deacon wrote:
> > > > > > On Wed, Nov 05, 2025 at 12:32:15PM -0800, Paul E. McKenney wrote:
> > > > > > > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > > > > > > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > > > > > > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > > > > > > atomic_read()/atomic_set() operations.  This works because
> > > > > > > SRCU-fast-updown is not invoked from read-side primitives, which
> > > > > > > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > > > > > > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > > > > > > exclude themselves and each other
> > > > > > > 
> > > > > > > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > > > > > > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > > > > > > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > > > > > > it sure beats 100ns.
> > > > > > > 
> > > > > > > This command was used to measure the overhead:
> > > > > > > 
> > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > > > Cc: Will Deacon <will@kernel.org>
> > > > > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > > > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > > > > Cc: <linux-arm-kernel@lists.infradead.org>
> > > > > > > Cc: <bpf@vger.kernel.org>
> > > > > > > ---
> > > > > > >  include/linux/srcutree.h | 51 +++++++++++++++++++++++++++++++++++++---
> > > > > > >  1 file changed, 48 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > I've queued the per-cpu tweak from Catalin in the arm64 fixes tree [1]
> > > > > > for 6.18, so please can you drop this SRCU commit from your tree?
> > > > > 
> > > > > Very good!  Adding Frederic on CC since he is doing the pull request
> > > > > for the upcoming merge window.
> > > > > 
> > > > > But if this doesn't show up in -rc1, we reserve the right to put it
> > > > > back in.
> > > > > 
> > > > > Sorry, couldn't resist!   ;-)
> > > > 
> > > > I've merged it as a fix, so hopefully it will show up in v6.18-rc6.
> > > 
> > > Even better, thank you!!!
> > 
> > It landed in Linus' tree here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?id=535fdfc5a228524552ee8810c9175e877e127c27
> 
> Again, thank you, and Breno has started backporting it for use in
> our fleet.
> 
> > Please can you drop the SRCU change from -next? It still shows up in
> > 20251121.
> 
> This one?
> 
> 11f748499236 ("srcu: Optimize SRCU-fast-updown for arm64")
> 
> if so, Frederic, could you please drop this commit?

Dropped, thanks!

(And I'm glad to do so given how error-prone it can be).

-- 
Frederic Weisbecker
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64
  2025-11-24 22:47               ` Frederic Weisbecker
@ 2025-11-25 11:40                 ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2025-11-25 11:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	Catalin Marinas, Mark Rutland, Mathieu Desnoyers,
	Sebastian Andrzej Siewior, linux-arm-kernel, bpf

On Mon, Nov 24, 2025 at 11:47:35PM +0100, Frederic Weisbecker wrote:
> Le Mon, Nov 24, 2025 at 09:20:25AM -0800, Paul E. McKenney a écrit :
> > On Mon, Nov 24, 2025 at 01:04:20PM +0000, Will Deacon wrote:
> > > It landed in Linus' tree here:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64?id=535fdfc5a228524552ee8810c9175e877e127c27
> > 
> > Again, thank you, and Breno has started backporting it for use in
> > our fleet.
> > 
> > > Please can you drop the SRCU change from -next? It still shows up in
> > > 20251121.
> > 
> > This one?
> > 
> > 11f748499236 ("srcu: Optimize SRCU-fast-updown for arm64")
> > 
> > if so, Frederic, could you please drop this commit?
> 
> Dropped, thanks!
> 
> (And I'm glad to do so given how error-prone it can be).

Thank you, both!

Will


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-11-25 11:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bb177afd-eea8-4a2a-9600-e36ada26a500@paulmck-laptop>
2025-11-05 20:32 ` [PATCH v2 15/16] srcu: Optimize SRCU-fast-updown for arm64 Paul E. McKenney
2025-11-08 13:07   ` Will Deacon
2025-11-08 18:38     ` Paul E. McKenney
2025-11-10 11:24       ` Will Deacon
2025-11-10 17:29         ` Paul E. McKenney
2025-11-24 13:04           ` Will Deacon
2025-11-24 17:20             ` Paul E. McKenney
2025-11-24 22:47               ` Frederic Weisbecker
2025-11-25 11:40                 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).