From: Catalin Marinas <catalin.marinas@arm.com>
To: Willy Tarreau <w@1wt.eu>
Cc: Yicong Yang <yangyccccc@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Overhead of arm64 LSE per-CPU atomics?
Date: Thu, 6 Nov 2025 13:53:04 +0000 [thread overview]
Message-ID: <aQyoQMHA_2HuK9sH@arm.com> (raw)
In-Reply-To: <20251106074439.GB24713@1wt.eu>
On Thu, Nov 06, 2025 at 08:44:39AM +0100, Willy Tarreau wrote:
> Do you have pointers to some docs suggesting what instructions to use
> when you prefer a near or far operation, like here with stadd vs ldadd ?
Unfortunately, the architecture spec does not make any distinction
between far or near atomics, that's rather a microarchitecture and
system implementation detail. Some of the information is hidden in
specific CPU TRMs and the behaviour may differ between implementations.
I hope Arm will publish some docs/blogs to give some guidance to
software folk (and other non-Arm Ltd microarchitects; it would be good
if they are all aligned, though some may see this as their value-add).
> Also does this mean that with LSE a pure store will always be far unless
> prefetched ? Or should we trick stores using stadd mem,0 / ldadd mem,0
> to hint a near vs far store for example ?
For the Arm Ltd implementations, _usually_ store-only atomics are
executed far while those returning a value are near. But that's subject
to implementation-defined configurations (e.g. IMP_CPUECTLR_EL1). Also
the hardware may try to be smarter, e.g. detect contention and switch
from one behaviour to another.
> I'm also wondering about CAS,
> if there's a way to perform the usual load+CAS sequence exclusively using
> far operations to avoid cache lines bouncing in contended environments,
> because there are cases where a constant 50-60ns per CAS would be awesome,
> or maybe even a CAS that remains far in case of failure or triggers the
> prefetch of the line in case of success, for the typical
> CAS(ptr, NULL, mine) used to try to own a shared resource.
Talking to other engineers in Arm, I learnt that the architecture even
describes a way the programmer can hint at CAS loops. Instead of an LDR,
use something (informally) called ICAS - a CAS where the Xs and Xt
registers are the same (actual registers, not the value they contain).
The in-memory value comparison with Xs either passes and the written
value would be the same (imp def whether a write actually takes place)
or fails (in theory, hw is allowed to write the same old value back). So
while the value in Xs is less relevant, CAS will return the value in
memory. The hardware detects the ICAS+CAS constructs and aims to make
them faster.
From the C6.2.50 in the Arm ARM (the CAS description):
For a CAS or CASA instruction, when <Ws> or <Xs> specifies the same
register as <Wt> or <Xt>, this signals to the memory system that an
additional subsequent CAS, CASA, CASAL, or CASL access to the
specified location is likely to occur in the near future. The memory
system can respond by taking actions that are expected to enable the
subsequent CAS, CASA, CASAL, or CASL access to succeed when it does
occur.
I guess something to add to Breno's microbenchmarks.
--
Catalin
next prev parent reply other threads:[~2025-11-06 13:53 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 22:37 Overhead of arm64 LSE per-CPU atomics? Paul E. McKenney
2025-10-31 18:30 ` Catalin Marinas
2025-10-31 19:39 ` Paul E. McKenney
2025-10-31 22:21 ` Paul E. McKenney
2025-10-31 22:43 ` Catalin Marinas
2025-10-31 23:38 ` Paul E. McKenney
2025-11-01 3:25 ` Paul E. McKenney
2025-11-01 9:44 ` Willy Tarreau
2025-11-01 18:07 ` Paul E. McKenney
2025-11-01 11:23 ` Catalin Marinas
2025-11-01 11:41 ` Yicong Yang
2025-11-05 13:25 ` Catalin Marinas
2025-11-05 13:42 ` Willy Tarreau
2025-11-05 14:49 ` Catalin Marinas
2025-11-05 16:21 ` Breno Leitao
2025-11-06 7:44 ` Willy Tarreau
2025-11-06 13:53 ` Catalin Marinas [this message]
2025-11-06 14:16 ` Willy Tarreau
2025-11-03 20:12 ` Palmer Dabbelt
2025-11-03 21:49 ` Catalin Marinas
2025-11-03 21:56 ` Willy Tarreau
2025-11-04 17:05 ` Catalin Marinas
2025-11-04 18:43 ` Paul E. McKenney
2025-11-04 20:10 ` Paul E. McKenney
2025-11-05 15:34 ` Catalin Marinas
2025-11-05 16:25 ` Paul E. McKenney
2025-11-05 17:15 ` Catalin Marinas
2025-11-05 17:40 ` Paul E. McKenney
2025-11-05 19:16 ` Catalin Marinas
2025-11-05 19:47 ` Paul E. McKenney
2025-11-05 20:17 ` Catalin Marinas
2025-11-05 20:45 ` Paul E. McKenney
2025-11-05 21:13 ` Palmer Dabbelt
2025-11-06 14:00 ` Catalin Marinas
2025-11-06 16:30 ` Palmer Dabbelt
2025-11-06 17:54 ` Catalin Marinas
2025-11-06 18:23 ` Palmer Dabbelt
2025-11-04 15:59 ` Breno Leitao
2025-11-04 17:06 ` Catalin Marinas
2025-11-04 18:08 ` Willy Tarreau
2025-11-04 18:22 ` Breno Leitao
2025-11-04 20:13 ` Paul E. McKenney
2025-11-04 20:35 ` Willy Tarreau
2025-11-04 21:25 ` Paul E. McKenney
2025-11-04 20:57 ` Puranjay Mohan
2025-11-27 12:29 ` Wentao Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQyoQMHA_2HuK9sH@arm.com \
--to=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=paulmck@kernel.org \
--cc=w@1wt.eu \
--cc=will@kernel.org \
--cc=yangyccccc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).