All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, rostedt@goodmis.org,
	Mark Rutland <mark.rutland@arm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-arm-kernel@lists.infradead.org, bpf@vger.kernel.org
Subject: Re: [PATCH 17/19] srcu: Optimize SRCU-fast-updown for arm64
Date: Mon, 3 Nov 2025 14:07:15 +0000	[thread overview]
Message-ID: <aQi3E-3hoNWYocOl@arm.com> (raw)
In-Reply-To: <aQilZAXQv2P-b2eI@willie-the-truck>

On Mon, Nov 03, 2025 at 12:51:48PM +0000, Will Deacon wrote:
> On Sun, Nov 02, 2025 at 01:44:34PM -0800, Paul E. McKenney wrote:
> > Some arm64 platforms have slow per-CPU atomic operations, for example,
> > the Neoverse V2.  This commit therefore moves SRCU-fast from per-CPU
> > atomic operations to interrupt-disabled non-read-modify-write-atomic
> > atomic_read()/atomic_set() operations.  This works because
> > SRCU-fast-updown is not invoked from read-side primitives, which
> > means that if srcu_read_unlock_fast() NMI handlers.  This means that
> > srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can
> > exclude themselves and each other
> > 
> > This reduces the overhead of calls to srcu_read_lock_fast_updown() and
> > srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM
> > Neoverse V2.  Although this is not excellent compared to about 2ns on x86,
> > it sure beats 100ns.
> > 
> > This command was used to measure the overhead:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=64 CONFIG_TASKS_TRACE_RCU=y" --bootargs "refscale.loops=100000 refscale.guest_os_delay=5 refscale.nreaders=64 refscale.holdoff=30 torture.disable_onoff_at_boot refscale.scale_type=srcu-fast-updown refscale.verbose_batched=8 torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=8 refscale.nruns=100" --trust-make
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: <linux-arm-kernel@lists.infradead.org>
> > Cc: <bpf@vger.kernel.org>
> > ---
> >  include/linux/srcutree.h | 56 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 51 insertions(+), 5 deletions(-)
> 
> [...]
> 
> > @@ -327,12 +355,23 @@ __srcu_read_unlock_fast(struct srcu_struct *ssp, struct srcu_ctr __percpu *scp)
> >  static inline
> >  struct srcu_ctr __percpu notrace *__srcu_read_lock_fast_updown(struct srcu_struct *ssp)
> >  {
> > -	struct srcu_ctr __percpu *scp = READ_ONCE(ssp->srcu_ctrp);
> > +	struct srcu_ctr __percpu *scp;
> >  
> > -	if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE))
> > +	if (IS_ENABLED(CONFIG_ARM64) && IS_ENABLED(CONFIG_ARM64_USE_LSE_PERCPU_ATOMICS)) {
> > +		unsigned long flags;
> > +
> > +		local_irq_save(flags);
> > +		scp = __srcu_read_lock_fast_na(ssp);
> > +		local_irq_restore(flags); /* Avoids leaking the critical section. */
> > +		return scp;
> > +	}
> 
> Do we still need to pursue this after Catalin's prefetch suggestion for the
> per-cpu atomics?
> 
> https://lore.kernel.org/r/aQU7l-qMKJTx4znJ@arm.com
> 
> Although disabling/enabling interrupts on your system seems to be
> significantly faster than an atomic instruction, I'm worried that it's
> all very SoC-specific and on a mobile part (especially with pseudo-NMI),
> the relative costs could easily be the other way around.

My preference would be to go for the percpu atomic prefetch but we'd
need to do a bit of benchmarking to see we don't break other platforms
(unlikely though).

-- 
Catalin

  reply	other threads:[~2025-11-03 14:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-02 21:44 [PATCH 0/18] SRCU updates for v6.19 Paul E. McKenney
2025-11-02 21:44 ` [PATCH 01/19] srcu: Permit Tiny SRCU srcu_read_unlock() with interrupts disabled Paul E. McKenney
2025-11-02 21:44 ` [PATCH 02/19] srcu: Create an srcu_expedite_current() function Paul E. McKenney
2025-11-02 21:44 ` [PATCH 03/19] rcutorture: Test srcu_expedite_current() Paul E. McKenney
2025-11-02 21:44 ` [PATCH 04/19] srcu: Create a DEFINE_SRCU_FAST() Paul E. McKenney
2025-11-02 21:44 ` [PATCH 05/19] srcu: Make grace-period determination use ssp->srcu_reader_flavor Paul E. McKenney
2025-11-02 21:44 ` [PATCH 06/19] rcutorture: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast() Paul E. McKenney
2025-11-02 21:44 ` [PATCH 07/19] srcu: Require special srcu_struct define/init for SRCU-fast readers Paul E. McKenney
2025-11-02 21:44 ` [PATCH 08/19] srcu: Make SRCU-fast readers enforce use of SRCU-fast definition/init Paul E. McKenney
2025-11-02 21:44 ` [PATCH 09/19] doc: Update for SRCU-fast definitions and initialization Paul E. McKenney
2025-11-02 21:44 ` [PATCH 10/19] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast Paul E. McKenney
2025-11-02 21:44 ` [PATCH 11/19] rcu: Mark diagnostic functions as notrace Paul E. McKenney
2025-11-02 21:44 ` [PATCH 12/19] rcutorture: Permit kvm-again.sh to re-use the build directory Paul E. McKenney
2025-11-02 21:44 ` [PATCH 13/19] srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro Paul E. McKenney
2025-11-02 21:44 ` [PATCH 14/19] torture: Permit negative kvm.sh --kconfig numberic arguments Paul E. McKenney
2025-11-02 21:44 ` [PATCH 15/19] srcu: Create an SRCU-fast-updown API Paul E. McKenney
2025-11-04  7:00   ` Akira Yokosawa
2025-11-04 16:49     ` Paul E. McKenney
2025-11-02 21:44 ` [PATCH 16/19] rcutorture: Test SRCU-fast separately from SRCU-fast-updown Paul E. McKenney
2025-11-02 21:44 ` [PATCH 17/19] srcu: Optimize SRCU-fast-updown for arm64 Paul E. McKenney
2025-11-03 12:51   ` Will Deacon
2025-11-03 14:07     ` Catalin Marinas [this message]
2025-11-03 17:02     ` Paul E. McKenney
2025-11-03 13:34   ` Mathieu Desnoyers
2025-11-03 17:08     ` Paul E. McKenney
2025-11-03 18:16       ` Mathieu Desnoyers
2025-11-03 19:17         ` Paul E. McKenney
2025-11-03 19:22           ` Mathieu Desnoyers
2025-11-02 21:44 ` [PATCH 18/19] rcutorture: Make srcu{,d}_torture_init() announce the SRCU type Paul E. McKenney
2025-11-02 21:44 ` [PATCH 19/19] rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read() Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQi3E-3hoNWYocOl@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.