From: Catalin Marinas <catalin.marinas@arm.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: paulmck@kernel.org, Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Overhead of arm64 LSE per-CPU atomics?
Date: Thu, 6 Nov 2025 17:54:31 +0000 [thread overview]
Message-ID: <aQzg1_uxNfCK8_V_@arm.com> (raw)
In-Reply-To: <mhng-4170B7AB-3975-451A-A177-3798DB46182F@palmerdabbelt-mac>
On Thu, Nov 06, 2025 at 08:30:05AM -0800, Palmer Dabbelt wrote:
> On Thu, 06 Nov 2025 06:00:59 PST (-0800), Catalin Marinas wrote:
> > On Wed, Nov 05, 2025 at 01:13:10PM -0800, Palmer Dabbelt wrote:
> > > I ran a bunch of cases with those:
> > [...]
> > > Which I'm interpreting to say the following:
> > >
> > > * LL/SC is pretty good for the common cases, but gets really bad under the
> > > pathological cases. It still seems always slower that LDADD.
> > > * STADD has latency that blocks other STADDs, but not other CPU-local work.
> > > I'd bet there's a bunch of interactions with caches and memory ordering
> > > here, but those would all juts make STADD look worse so I'm just ignoring
> > > them.
> > > * LDADD is better than STADD even under pathologically highly contended
> > > cases. I was actually kind of surprised about this one, I thought the far
> > > atomics would be better there.
> > > * The prefetches help STADD, but they don't seem to make it better that
> > > LDADD in any case.
> > > * The LDADD latency also happens concurrently with other CPU operations
> > > like the STADD latency does. It has less latency to hide, so the latency
> > > starts to go up with less extra work, but it's never worse that STADD.
> > >
> > > So I think at least on this system, LDADD is just always better.
> >
> > Thanks for this, very useful. I guess that's expected in the light of I
> > learnt from the other Arm engineers in the past couple of days.
>
> OK, sorry if I misunderstood you earlier. From reading your posts I thought
> there would be some mode in which STADD was better -- probably high
> contention and enough extra work to hide the latency. So I was kind of
> surprised to find these results.
I think STADD is better for cases where you update some stat counters
but you do a lot of work in between. In your microbenchmark, just lots
of STADDs back to back with NOPs in between (rather than lots of other
memory transactions) are likely to be slower. If these are real
use-cases, at some point the hardware may evolve to behave differently
(or more dynamically).
BTW, I've been pointed by Ola Liljedahl @ Arm at this collection of
routines: https://github.com/ARM-software/progress64/tree/master.
Building it with ATOMICS=yes makes the compiler generate LSE atomics for
intrinsics like __atomic_fetch_add(). It won't generate STADD because of
some aspects of the C consistency models (DMB LD wouldn't guarantee
ordering with a prior STADD).
--
Catalin
next prev parent reply other threads:[~2025-11-06 17:54 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 22:37 Overhead of arm64 LSE per-CPU atomics? Paul E. McKenney
2025-10-31 18:30 ` Catalin Marinas
2025-10-31 19:39 ` Paul E. McKenney
2025-10-31 22:21 ` Paul E. McKenney
2025-10-31 22:43 ` Catalin Marinas
2025-10-31 23:38 ` Paul E. McKenney
2025-11-01 3:25 ` Paul E. McKenney
2025-11-01 9:44 ` Willy Tarreau
2025-11-01 18:07 ` Paul E. McKenney
2025-11-01 11:23 ` Catalin Marinas
2025-11-01 11:41 ` Yicong Yang
2025-11-05 13:25 ` Catalin Marinas
2025-11-05 13:42 ` Willy Tarreau
2025-11-05 14:49 ` Catalin Marinas
2025-11-05 16:21 ` Breno Leitao
2025-11-06 7:44 ` Willy Tarreau
2025-11-06 13:53 ` Catalin Marinas
2025-11-06 14:16 ` Willy Tarreau
2025-11-03 20:12 ` Palmer Dabbelt
2025-11-03 21:49 ` Catalin Marinas
2025-11-03 21:56 ` Willy Tarreau
2025-11-04 17:05 ` Catalin Marinas
2025-11-04 18:43 ` Paul E. McKenney
2025-11-04 20:10 ` Paul E. McKenney
2025-11-05 15:34 ` Catalin Marinas
2025-11-05 16:25 ` Paul E. McKenney
2025-11-05 17:15 ` Catalin Marinas
2025-11-05 17:40 ` Paul E. McKenney
2025-11-05 19:16 ` Catalin Marinas
2025-11-05 19:47 ` Paul E. McKenney
2025-11-05 20:17 ` Catalin Marinas
2025-11-05 20:45 ` Paul E. McKenney
2025-11-05 21:13 ` Palmer Dabbelt
2025-11-06 14:00 ` Catalin Marinas
2025-11-06 16:30 ` Palmer Dabbelt
2025-11-06 17:54 ` Catalin Marinas [this message]
2025-11-06 18:23 ` Palmer Dabbelt
2025-11-04 15:59 ` Breno Leitao
2025-11-04 17:06 ` Catalin Marinas
2025-11-04 18:08 ` Willy Tarreau
2025-11-04 18:22 ` Breno Leitao
2025-11-04 20:13 ` Paul E. McKenney
2025-11-04 20:35 ` Willy Tarreau
2025-11-04 21:25 ` Paul E. McKenney
2025-11-04 20:57 ` Puranjay Mohan
2025-11-27 12:29 ` Wentao Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQzg1_uxNfCK8_V_@arm.com \
--to=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=palmer@dabbelt.com \
--cc=paulmck@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).