From: Willy Tarreau <w@1wt.eu>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Overhead of arm64 LSE per-CPU atomics?
Date: Sat, 1 Nov 2025 10:44:48 +0100 [thread overview]
Message-ID: <20251101094448.GA28965@1wt.eu> (raw)
In-Reply-To: <e819db66-7f60-464d-9ee8-4e8ee3e59acf@paulmck-laptop>
Hi!
On Fri, Oct 31, 2025 at 08:25:07PM -0700, Paul E. McKenney wrote:
> > > -----------------8<------------------------
> > > diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> > > index 9abcc8ef3087..e381034324e1 100644
> > > --- a/arch/arm64/include/asm/percpu.h
> > > +++ b/arch/arm64/include/asm/percpu.h
> > > @@ -70,6 +70,7 @@ __percpu_##name##_case_##sz(void *ptr, unsigned long val) \
> > > unsigned int loop; \
> > > u##sz tmp; \
> > > \
> > > + asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > > asm volatile (ARM64_LSE_ATOMIC_INSN( \
> > > /* LL/SC */ \
> > > "1: ldxr" #sfx "\t%" #w "[tmp], %[ptr]\n" \
> > > @@ -91,6 +92,7 @@ __percpu_##name##_return_case_##sz(void *ptr, unsigned long val) \
> > > unsigned int loop; \
> > > u##sz ret; \
> > > \
> > > + asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > > asm volatile (ARM64_LSE_ATOMIC_INSN( \
> > > /* LL/SC */ \
> > > "1: ldxr" #sfx "\t%" #w "[ret], %[ptr]\n" \
> > > -----------------8<------------------------
> >
> > I will give this a shot, thank you!
>
> Jackpot!!!
>
> This reduces the overhead to 8.427, which is significantly better than
> the non-LSE value of 9.853. Still room for improvement, but much
> better than the 100ns values.
This is super interesting! I've blindly applied a similar change to all
of our atomics in haproxy and am seeing a consistent 2-7% perf increase
depending on the tests on a 80-core Ampere Altra (neoverse-n1). There
as well we're significantly using atomics to read/update mostly local
variables as we avoid sharing as much as possible. I'm pretty sure it
does hurt in certain cases, and we don't have this distinction of per_cpu
variants like here, however that makes me think about adding a "mostly
local" variant that we can choose from depending on the context. I'll
continue to experiment, thanks for sharing this trick (particularly to
Yicong Yang, the original reporter).
Willy
next prev parent reply other threads:[~2025-11-01 9:45 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 22:37 Overhead of arm64 LSE per-CPU atomics? Paul E. McKenney
2025-10-31 18:30 ` Catalin Marinas
2025-10-31 19:39 ` Paul E. McKenney
2025-10-31 22:21 ` Paul E. McKenney
2025-10-31 22:43 ` Catalin Marinas
2025-10-31 23:38 ` Paul E. McKenney
2025-11-01 3:25 ` Paul E. McKenney
2025-11-01 9:44 ` Willy Tarreau [this message]
2025-11-01 18:07 ` Paul E. McKenney
2025-11-01 11:23 ` Catalin Marinas
2025-11-01 11:41 ` Yicong Yang
2025-11-05 13:25 ` Catalin Marinas
2025-11-05 13:42 ` Willy Tarreau
2025-11-05 14:49 ` Catalin Marinas
2025-11-05 16:21 ` Breno Leitao
2025-11-06 7:44 ` Willy Tarreau
2025-11-06 13:53 ` Catalin Marinas
2025-11-06 14:16 ` Willy Tarreau
2025-11-03 20:12 ` Palmer Dabbelt
2025-11-03 21:49 ` Catalin Marinas
2025-11-03 21:56 ` Willy Tarreau
2025-11-04 17:05 ` Catalin Marinas
2025-11-04 18:43 ` Paul E. McKenney
2025-11-04 20:10 ` Paul E. McKenney
2025-11-05 15:34 ` Catalin Marinas
2025-11-05 16:25 ` Paul E. McKenney
2025-11-05 17:15 ` Catalin Marinas
2025-11-05 17:40 ` Paul E. McKenney
2025-11-05 19:16 ` Catalin Marinas
2025-11-05 19:47 ` Paul E. McKenney
2025-11-05 20:17 ` Catalin Marinas
2025-11-05 20:45 ` Paul E. McKenney
2025-11-05 21:13 ` Palmer Dabbelt
2025-11-06 14:00 ` Catalin Marinas
2025-11-06 16:30 ` Palmer Dabbelt
2025-11-06 17:54 ` Catalin Marinas
2025-11-06 18:23 ` Palmer Dabbelt
2025-11-04 15:59 ` Breno Leitao
2025-11-04 17:06 ` Catalin Marinas
2025-11-04 18:08 ` Willy Tarreau
2025-11-04 18:22 ` Breno Leitao
2025-11-04 20:13 ` Paul E. McKenney
2025-11-04 20:35 ` Willy Tarreau
2025-11-04 21:25 ` Paul E. McKenney
2025-11-04 20:57 ` Puranjay Mohan
2025-11-27 12:29 ` Wentao Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251101094448.GA28965@1wt.eu \
--to=w@1wt.eu \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=paulmck@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).