Re: [PATCH v3 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timewait()

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, bpf@vger.kernel.org,
	arnd@arndb.de, will@kernel.org, peterz@infradead.org,
	akpm@linux-foundation.org, mark.rutland@arm.com,
	harisokn@amazon.com, cl@gentwo.org, ast@kernel.org,
	memxor@gmail.com, zhenglifeng1@huawei.com,
	xueshuai@linux.alibaba.com, joao.m.martins@oracle.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	rafael@kernel.org, daniel.lezcano@linaro.org
Subject: Re: [PATCH v3 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timewait()
Date: Tue, 19 Aug 2025 11:34:47 +0100	[thread overview]
Message-ID: <aKRTRyQAaWFtRvDv@arm.com> (raw)
In-Reply-To: <87sehotp9q.fsf@oracle.com>

On Mon, Aug 18, 2025 at 12:15:29PM -0700, Ankur Arora wrote:
> Catalin Marinas <catalin.marinas@arm.com> writes:
> > On Sun, Aug 17, 2025 at 03:14:26PM -0700, Ankur Arora wrote:
> >> __cmpwait_relaxed() will need adjustment to set a deadline for WFET.
> >
> > Yeah, __cmpwait_relaxed() doesn't use WFET as it doesn't need a timeout
> > (it just happens to have one with the event stream).
> >
> > We could extend this or create a new one that uses WFET and takes an
> > argument. If extending this one, for example a timeout argument of 0
> > means WFE, non-zero means WFET cycles. This adds a couple of more
> > instructions.
> 
> Though then we would need an ALTERNATIVE for WFET to fallback to WFE where
> not available. This is a minor point, but how about just always using
> WFE or WFET appropriately instead of choosing between the two based on
> etime.
> 
>   static inline void __cmpwait_case_##sz(volatile void *ptr,              \
>                                   unsigned long val,                      \
>                                   unsigned long etime)                    \
>                                                                           \
>           unsigned long tmp;                                              \
>                                                                           \
>           const unsigned long ecycles = xloops_to_cycles(nsecs_to_xloops(etime)); \
>           asm volatile(                                                   \
>           "       sevl\ n"                                                \
>           "       wfe\ n"                                                 \
>           "       ldxr" #sfx "\ t%" #w "[tmp], %[v]\n"                    \
>           "       eor     %" #w "[tmp], %" #w "[tmp], %" #w "[val]\ n"    \
>           "       cbnz    %" #w "[tmp], 1f\ n"                            \
>           ALTERNATIVE("wfe\ n",                                           \
>                   "msr s0_3_c1_c0_0, %[ecycles]\ n",                      \
>                   ARM64_HAS_WFXT)                                         \
>           "1:"                                                            \
>           : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr)                   \
>           : [val] "r" (val), [ecycles] "r" (ecycles));                    \
>   }
> 
> This would cause us to compute the end time unnecessarily for WFE but,
> given that nothing will use the output of that computation, wouldn't
> WFE be able to execute before the result of that computation is available?
> (Though I guess WFE is somewhat special, so the usual rules might not
> apply.)

The compiler cannot tell what's happening inside the asm block, so it
will compute ecycles, place it in a register before the asm. The
hardware won't do anything smarter like skip the computation because the
register holding ecycles is not going to be used (or it is going to be
re-written later). So I wouldn't want to penalise the existing
smp_cond_load_acquire() which only needs a WFE.

We could patch WFET in and always pass -1UL in the non-timeout case but
I think we are better off just duplicating the whole thing. It's going
to be inlined anyway, so it's not like we end up with lots of these
functions.

-- 
Catalin

next prev parent reply	other threads:[~2025-08-19 10:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27  4:48 [PATCH v3 0/5] barrier: Add smp_cond_load_*_timewait() Ankur Arora
2025-06-27  4:48 ` [PATCH v3 1/5] asm-generic: barrier: Add smp_cond_load_relaxed_timewait() Ankur Arora
2025-08-08 10:51   ` Catalin Marinas
2025-08-11 21:15     ` Ankur Arora
2025-08-13 16:09       ` Catalin Marinas
2025-08-13 16:29         ` Arnd Bergmann
2025-08-13 16:54           ` Christoph Lameter (Ampere)
2025-08-14 13:00           ` Catalin Marinas
2025-08-18 11:51             ` Arnd Bergmann
2025-08-18 18:28               ` Catalin Marinas
2025-08-14  7:30         ` Ankur Arora
2025-08-14 11:39           ` Catalin Marinas
2025-08-17 22:14             ` Ankur Arora
2025-08-18 17:55               ` Catalin Marinas
2025-08-18 19:15                 ` Ankur Arora
2025-08-19 10:34                   ` Catalin Marinas [this message]
2025-06-27  4:48 ` [PATCH v3 2/5] asm-generic: barrier: Handle spin-wait in smp_cond_load_relaxed_timewait() Ankur Arora
2025-06-27  4:48 ` [PATCH v3 3/5] asm-generic: barrier: Add smp_cond_load_acquire_timewait() Ankur Arora
2025-08-08  9:38   ` Catalin Marinas
2025-08-12  5:18     ` Ankur Arora
2025-06-27  4:48 ` [PATCH v3 4/5] arm64: barrier: Support waiting in smp_cond_load_relaxed_timewait() Ankur Arora
2025-06-27  4:48 ` [PATCH v3 5/5] arm64: barrier: Handle " Ankur Arora
2025-06-30 16:33   ` Christoph Lameter (Ampere)
2025-06-30 21:05     ` Ankur Arora
2025-07-01  5:55       ` Ankur Arora
2025-07-28 19:03 ` [PATCH v3 0/5] barrier: Add smp_cond_load_*_timewait() Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKRTRyQAaWFtRvDv@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankur.a.arora@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=harisokn@amazon.com \
    --cc=joao.m.martins@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=memxor@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhenglifeng1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).