Re: [RFT PATCH] arm64: atomics: prefetch the destination prior to LSE operations

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Yicong Yang <yangyicong@huawei.com>
Cc: mark.rutland@arm.com, catalin.marinas@arm.com, maz@kernel.org,
	broonie@kernel.org, linux-arm-kernel@lists.infradead.org,
	wangkefeng.wang@huawei.com, baohua@kernel.org,
	jonathan.cameron@huawei.com,
	shameerali.kolothum.thodi@huawei.com, prime.zeng@hisilicon.com,
	xuwei5@huawei.com, yangyicong@hisilicon.com, linuxarm@huawei.com,
	tiantao6@hisilicon.com
Subject: Re: [RFT PATCH] arm64: atomics: prefetch the destination prior to LSE operations
Date: Fri, 8 Aug 2025 12:35:57 +0100	[thread overview]
Message-ID: <aJXhHVEIdafxcLP_@willie-the-truck> (raw)
In-Reply-To: <20250724120651.27983-1-yangyicong@huawei.com>

On Thu, Jul 24, 2025 at 08:06:51PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> commit 0ea366f5e1b6 ("arm64: atomics: prefetch the destination word for write prior to stxr")
> adds prefetch prior to LL/SC operations due to performance concerns -
> change the cacheline status from exclusive could be significant. This is
> also true for LSE operations, so prefetch the destination prior to LSE
> operations.
> 
> Tested on my HIP08 server (2 * 64 CPU) using `perf bench -r 100 futex all`
> which could stress the spinlock of the futex hash bucket:
>                         6.16-rc7 patched
> futex/hash(ops/sec)     171843   204757 +19.15%
> futex/wake(ms)          0.4630   0.4216 +8.94%
> futex/wake-parallel(ms) 0.0048   0.0039 +18.75%
> futex/requeue(ms)       0.1487   0.1508 -1.41%
> (2nd validation)                 0.1484 +0.2%
> futex/lock-pi(ops/sec)  125      126    +0.8%
> 
> For a single wake test for different threads number using `perf bench
> -r 100 futex wake -t <threads>`:
> threads 6.16-rc7 patched
> 1       0.0035   0.0032 +8.57%
> 48      0.1454   0.1221 +16.02%
> 96      0.3047   0.2304 +24.38%
> 160     0.5489   0.5012 +8.69%
> 192     0.6675   0.5906 +11.52%
> 256     0.9445   0.8092 +14.33%
> 
> There're some variation for close numbers but overall results
> look positive.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> 
> RFT for tests and feedbacks since not sure it's general or just the optimization
> on some specific implementations.
> 
>  arch/arm64/include/asm/atomic_lse.h | 7 +++++++
>  arch/arm64/include/asm/cmpxchg.h    | 3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)

One of the motivations behind rmw instructions (as opposed to ldxr/stxr
loops) is so that the atomic operation can be performed at different
places in the memory hierarchy depending upon where the data resides.

For example, if a shared counter is sitting at a level of system cache,
it may be optimal to leave it there so that CPUs around the system can
post atomic increments to it without forcing the line up and down the
cache hierarchy every time.

So, although adding an L1 prefetch may help some specific benchmarks on
a specific system, I don't think this is generally a good idea for
scalability. The hardware should be able to figure out the best place to
do the operation and, if you have a system where that means it should
always be performed within the CPU, then you should probably configure
it not to send the atomic remotely rather than force that in the kernel
for everybody.

Will

next prev parent reply	other threads:[~2025-08-08 12:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-24 12:06 [RFT PATCH] arm64: atomics: prefetch the destination prior to LSE operations Yicong Yang
2025-08-08 11:35 ` Will Deacon [this message]
2025-08-09  9:48   ` Yicong Yang
2025-11-06 22:23     ` Palmer Dabbelt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJXhHVEIdafxcLP_@willie-the-truck \
    --to=will@kernel.org \
    --cc=baohua@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tiantao6@hisilicon.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=xuwei5@huawei.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox