Re: Overhead of arm64 LSE per-CPU atomics?

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Willy Tarreau <w@1wt.eu>
Cc: Yicong Yang <yangyccccc@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Overhead of arm64 LSE per-CPU atomics?
Date: Wed, 5 Nov 2025 14:49:39 +0000	[thread overview]
Message-ID: <aQtkA0cOb6Fm9Mrc@arm.com> (raw)
In-Reply-To: <20251105134231.GF22848@1wt.eu>

On Wed, Nov 05, 2025 at 02:42:31PM +0100, Willy Tarreau wrote:
> On Wed, Nov 05, 2025 at 01:25:25PM +0000, Catalin Marinas wrote:
> > > But need to add the prefetch in per-cpu implementation as you've
> > > noticed above (didn't add it since no prefetch for LL/SC
> > > implementation there, maybe a missing?)
> > 
> > Maybe no-one stressed these to notice any difference between LL/SC and
> > LSE.
> 
> Huh ? I can say for certain that LL/SC is a no-go beyond 16 cores, for
> having faced catastrophic performance there on haproxy, while with LSE
> it continues to scale almost linearly at least till 64.

I was referring only to the this_cpu_add() etc. functions (until Paul
started using them). There definitely have been lots of benchmarks on
the scalability of LL/SC. That's one of the reasons Arm added the LSE
atomics years ago.

> But that does
> not mean that if some possibilities are within reach to recover 90% of
> the atomic overhead in uncontended case we shouldn't try to grab it at
> a reasonable cost!

I agree. Even for these cases, I don't think the solution is LL/SC but
rather better use of LSE (and better understanding of the hardware
behaviour; feedback here should go both ways).

> I'm definitely adding in my todo list to experiment more on this on
> various CPUs now ;-)

Thanks for the tests so far, very insightful. I think what's still
good to assess is how PRFM+STADD compares to LDADD (without PRFM) in
Breno's microbenchmarks. I suspect LDADD is still better.

FWIW, Neoverse-N1 has an erratum affecting the far atomics and they are
all forced near, so this explains the consistent results you got with
STADD on this CPU. On other CPUs, STADD would likely be executed far
unless it hits in the L1 cache.

-- 
Catalin

next prev parent reply	other threads:[~2025-11-05 14:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 22:37 Overhead of arm64 LSE per-CPU atomics? Paul E. McKenney
2025-10-31 18:30 ` Catalin Marinas
2025-10-31 19:39   ` Paul E. McKenney
2025-10-31 22:21     ` Paul E. McKenney
2025-10-31 22:43     ` Catalin Marinas
2025-10-31 23:38       ` Paul E. McKenney
2025-11-01  3:25         ` Paul E. McKenney
2025-11-01  9:44           ` Willy Tarreau
2025-11-01 18:07             ` Paul E. McKenney
2025-11-01 11:23           ` Catalin Marinas
2025-11-01 11:41             ` Yicong Yang
2025-11-05 13:25               ` Catalin Marinas
2025-11-05 13:42                 ` Willy Tarreau
2025-11-05 14:49                   ` Catalin Marinas [this message]
2025-11-05 16:21                     ` Breno Leitao
2025-11-06  7:44                     ` Willy Tarreau
2025-11-06 13:53                       ` Catalin Marinas
2025-11-06 14:16                         ` Willy Tarreau
2025-11-03 20:12             ` Palmer Dabbelt
2025-11-03 21:49           ` Catalin Marinas
2025-11-03 21:56             ` Willy Tarreau
2025-11-04 17:05           ` Catalin Marinas
2025-11-04 18:43             ` Paul E. McKenney
2025-11-04 20:10               ` Paul E. McKenney
2025-11-05 15:34                 ` Catalin Marinas
2025-11-05 16:25                   ` Paul E. McKenney
2025-11-05 17:15                     ` Catalin Marinas
2025-11-05 17:40                       ` Paul E. McKenney
2025-11-05 19:16                         ` Catalin Marinas
2025-11-05 19:47                           ` Paul E. McKenney
2025-11-05 20:17                             ` Catalin Marinas
2025-11-05 20:45                               ` Paul E. McKenney
2025-11-05 21:13                           ` Palmer Dabbelt
2025-11-06 14:00                             ` Catalin Marinas
2025-11-06 16:30                               ` Palmer Dabbelt
2025-11-06 17:54                                 ` Catalin Marinas
2025-11-06 18:23                                   ` Palmer Dabbelt
2025-11-04 15:59   ` Breno Leitao
2025-11-04 17:06     ` Catalin Marinas
2025-11-04 18:08     ` Willy Tarreau
2025-11-04 18:22       ` Breno Leitao
2025-11-04 20:13       ` Paul E. McKenney
2025-11-04 20:35         ` Willy Tarreau
2025-11-04 21:25           ` Paul E. McKenney
2025-11-04 20:57     ` Puranjay Mohan
2025-11-27 12:29     ` Wentao Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQtkA0cOb6Fm9Mrc@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=w@1wt.eu \
    --cc=will@kernel.org \
    --cc=yangyccccc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).