From: Catalin Marinas <catalin.marinas@arm.com>
To: Breno Leitao <leitao@debian.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org, kernel-team@meta.com,
rmikey@meta.com
Subject: Re: Overhead of arm64 LSE per-CPU atomics?
Date: Tue, 4 Nov 2025 17:06:14 +0000 [thread overview]
Message-ID: <aQoyhl_aJ8MFsmWE@arm.com> (raw)
In-Reply-To: <ahkk2r22peni4s7j6c7tnv3uajvwiaeg3vwyusppblcokpvgjw@zuuzipntgu7x>
Hi Breno,
On Tue, Nov 04, 2025 at 07:59:38AM -0800, Breno Leitao wrote:
> On Fri, Oct 31, 2025 at 06:30:31PM +0000, Catalin Marinas wrote:
> > On Thu, Oct 30, 2025 at 03:37:00PM -0700, Paul E. McKenney wrote:
> > > To make event tracing safe for PREEMPT_RT kernels, I have been creating
> > > optimized variants of SRCU readers that use per-CPU atomics. This works
> > > quite well, but on ARM Neoverse V2, I am seeing about 100ns for a
> > > srcu_read_lock()/srcu_read_unlock() pair, or about 50ns for a single
> > > per-CPU atomic operation. This contrasts with a handful of nanoseconds
> > > on x86 and similar on ARM for a atomic_set(&foo, atomic_read(&foo) + 1).
> >
> > That's quite a difference. Does it get any better if
> > CONFIG_ARM64_LSE_ATOMICS is disabled? We don't have a way to disable it
> > on the kernel command line.
> >
> > Depending on the implementation and configuration, the LSE atomics may
> > skip the L1 cache and be executed closer to the memory (they used to be
> > called far atomics). The CPUs try to be smarter like doing the operation
> > "near" if it's in the cache but the heuristics may not always work.
>
> I am trying to play with LSE latency and compare it with LL/SC usecase. I
> _think_ I have a reproducer in userspace
>
> I've create a simple userspace program to compare the latency of a atomic add
> using LL/SC and LSE, basically comparing the following two functions while
> executing without any contention (single thread doing the atomic operation -
> no atomic contention):
>
> static inline void __percpu_add_case_64_llsc(void *ptr, unsigned long val)
> {
> asm volatile(
> /* LL/SC */
> "1: ldxr %[tmp], %[ptr]\n"
> " add %[tmp], %[tmp], %[val]\n"
> " stxr %w[loop], %[tmp], %[ptr]\n"
> " cbnz %w[loop], 1b"
> : [loop] "=&r"(loop), [tmp] "=&r"(tmp), [ptr] "+Q"(*(u64 *)ptr)
> : [val] "r"((u64)(val))
> : "memory");
> }
>
> and
>
> /* LSE implementation */
> static inline void __percpu_add_case_64_lse(void *ptr, unsigned long val)
> {
> asm volatile(
> /* LSE atomics */
> " stadd %[val], %[ptr]\n"
> : [ptr] "+Q"(*(u64 *)ptr)
> : [val] "r"((u64)(val))
> : "memory");
> }
Could you try with an ldadd instead? See my reply to Paul a few minutes
ago.
Thanks.
--
Catalin
next prev parent reply other threads:[~2025-11-04 17:06 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 22:37 Overhead of arm64 LSE per-CPU atomics? Paul E. McKenney
2025-10-31 18:30 ` Catalin Marinas
2025-10-31 19:39 ` Paul E. McKenney
2025-10-31 22:21 ` Paul E. McKenney
2025-10-31 22:43 ` Catalin Marinas
2025-10-31 23:38 ` Paul E. McKenney
2025-11-01 3:25 ` Paul E. McKenney
2025-11-01 9:44 ` Willy Tarreau
2025-11-01 18:07 ` Paul E. McKenney
2025-11-01 11:23 ` Catalin Marinas
2025-11-01 11:41 ` Yicong Yang
2025-11-05 13:25 ` Catalin Marinas
2025-11-05 13:42 ` Willy Tarreau
2025-11-05 14:49 ` Catalin Marinas
2025-11-05 16:21 ` Breno Leitao
2025-11-06 7:44 ` Willy Tarreau
2025-11-06 13:53 ` Catalin Marinas
2025-11-06 14:16 ` Willy Tarreau
2025-11-03 20:12 ` Palmer Dabbelt
2025-11-03 21:49 ` Catalin Marinas
2025-11-03 21:56 ` Willy Tarreau
2025-11-04 17:05 ` Catalin Marinas
2025-11-04 18:43 ` Paul E. McKenney
2025-11-04 20:10 ` Paul E. McKenney
2025-11-05 15:34 ` Catalin Marinas
2025-11-05 16:25 ` Paul E. McKenney
2025-11-05 17:15 ` Catalin Marinas
2025-11-05 17:40 ` Paul E. McKenney
2025-11-05 19:16 ` Catalin Marinas
2025-11-05 19:47 ` Paul E. McKenney
2025-11-05 20:17 ` Catalin Marinas
2025-11-05 20:45 ` Paul E. McKenney
2025-11-05 21:13 ` Palmer Dabbelt
2025-11-06 14:00 ` Catalin Marinas
2025-11-06 16:30 ` Palmer Dabbelt
2025-11-06 17:54 ` Catalin Marinas
2025-11-06 18:23 ` Palmer Dabbelt
2025-11-04 15:59 ` Breno Leitao
2025-11-04 17:06 ` Catalin Marinas [this message]
2025-11-04 18:08 ` Willy Tarreau
2025-11-04 18:22 ` Breno Leitao
2025-11-04 20:13 ` Paul E. McKenney
2025-11-04 20:35 ` Willy Tarreau
2025-11-04 21:25 ` Paul E. McKenney
2025-11-04 20:57 ` Puranjay Mohan
2025-11-27 12:29 ` Wentao Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQoyhl_aJ8MFsmWE@arm.com \
--to=catalin.marinas@arm.com \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=paulmck@kernel.org \
--cc=rmikey@meta.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.