From: Ankur Arora <ankur.a.arora@oracle.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org,
Linux-Arch <linux-arch@vger.kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
bpf@vger.kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mark Rutland <mark.rutland@arm.com>,
Haris Okanovic <harisokn@amazon.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Alexei Starovoitov <ast@kernel.org>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
zhenglifeng1@huawei.com, xueshuai@linux.alibaba.com,
Joao Martins <joao.m.martins@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Date: Mon, 03 Nov 2025 13:41:04 -0800 [thread overview]
Message-ID: <87a512eqvj.fsf@oracle.com> (raw)
In-Reply-To: <84914748-4bf9-44a5-9e08-80528ca27177@app.fastmail.com>
Arnd Bergmann <arnd@arndb.de> writes:
> On Wed, Oct 29, 2025, at 04:17, Ankur Arora wrote:
>> Arnd Bergmann <arnd@arndb.de> writes:
>>> On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:
>>> The FEAT_WFXT version would then look something like
>>>
>>> static inline void __cmpwait_u64_timeout(volatile u64 *ptr, unsigned long val, __u64 ns)
>>> {
>>> unsigned long tmp;
>>> asm volatile ("sev; wfe; ldxr; eor; cbnz; wfet; 1:"
>>> : "=&r" (tmp), "+Q" (*ptr)
>>> : "r" (val), "r" (ns));
>>> }
>>> #define cpu_poll_relax_timeout_wfet(__PTR, VAL, TIMECHECK) \
>>> ({ \
>>> u64 __t = TIMECHECK;
>>> if (__t)
>>> __cmpwait_u64_timeout(__PTR, VAL, __t);
>>> })
>>>
>>> while the 'wfe' version would continue to do the timecheck after the
>>> wait.
>>
>> I think this is a good way to do it if we need the precision
>> at some point in the future.
>
> I'm sorry I wrote this in a confusing way, but I really don't
> care about precision here, but for power savings reasons I
> would like to use a wfe/wfet based wait here (like you do)
> while at the same time being able to turn off the event
> stream entirely as long as FEAR_WFXT is available, saving
> additional power.
>
>>> I have two lesser concerns with the generic definition here:
>>>
>>> - having both a timeout and a spin counter in the same loop
>>> feels redundant and error-prone, as the behavior in practice
>>> would likely depend a lot on the platform. What is the reason
>>> for keeping the counter if we already have a fixed timeout
>>> condition?
>>
>> The main reason was that the time check is expensive in power terms.
>> Which is fine for platforms with a WFE like primitive but others
>> want to do the time check only infrequently. That's why poll_idle()
>> introduced a rate limit on polling (which the generic definition
>> reused here.)
>
> Right, I misunderstood how this works, so this part is fine.
>
>>> - I generally dislike the type-agnostic macros like this one,
>>> it adds a lot of extra complexity here that I feel can be
>>> completely avoided if we make explicitly 32-bit and 64-bit
>>> wide versions of these macros. We probably won't be able
>>> to resolve this as part of your series, but ideally I'd like
>>> have explicitly-typed versions of cmpxchg(), smp_load_acquire()
>>> and all the related ones, the same way we do for atomic_*()
>>> and atomic64_*().
>>
>> Ah. And the caller uses say smp_load_acquire_long() or whatever, and
>> that resolves to whatever makes sense for the arch.
>>
>> The __unqual_scalar_typeof() does look pretty ugly when looking at the
>> preprocesed version but other than that smp_cond_load() etc look
>> pretty straight forward. Just for my curiousity could you elaborate on
>> the complexity?
>
> The nested macros with __unqual_scalar_typeof() make the
> preprocessed version completely unreadable, especially when
> combined with other sets of complex macros like min()/max(),
> tracepoints or arm64 atomics.
>
> If something goes wrong inside of it, it becomes rather
> hard for people to debug or even read the compiler warnings.
> Since I do a lot of build testing, I do tend to run into those
> more than others do.
>
> I've also seen cases where excessively complex macros get
> nested to the point of slowing down the kernel build, which
> can happen once the nesting expands to megabytes of source
> code.
Thanks for the detail. Yeah, looking at the preprocessed code (as I had
to do when working on this) it all really feels quite excessive.
Combined with other complex macro constructs ...
--
ankur
next prev parent reply other threads:[~2025-11-03 21:41 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 5:31 [RESEND PATCH v7 0/7] barrier: Add smp_cond_load_*_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 9:42 ` Arnd Bergmann
2025-10-29 3:17 ` Ankur Arora
2025-11-02 21:52 ` Arnd Bergmann
2025-11-03 21:41 ` Ankur Arora [this message]
2025-10-28 16:13 ` Christoph Lameter (Ampere)
2025-10-28 5:31 ` [RESEND PATCH v7 2/7] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 8:42 ` Arnd Bergmann
2025-10-28 16:21 ` Christoph Lameter (Ampere)
2025-10-28 18:01 ` Ankur Arora
2025-10-28 21:17 ` Catalin Marinas
2025-11-02 21:39 ` Arnd Bergmann
2025-11-03 21:00 ` Ankur Arora
2025-11-04 13:55 ` Catalin Marinas
2025-11-05 8:27 ` Ankur Arora
2025-11-05 10:37 ` Arnd Bergmann
2025-11-06 0:36 ` Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 3/7] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 4/7] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 5/7] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 6/7] rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 12:30 ` Rafael J. Wysocki
2025-10-29 4:41 ` Ankur Arora
2025-10-29 18:53 ` Rafael J. Wysocki
2025-10-29 19:13 ` Ankur Arora
2025-10-29 20:29 ` Rafael J. Wysocki
2025-10-29 21:01 ` Ankur Arora
2025-11-04 18:07 ` Rafael J. Wysocki
2025-11-05 8:30 ` Ankur Arora
2025-10-28 16:16 ` Christoph Lameter (Ampere)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a512eqvj.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=ast@kernel.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=daniel.lezcano@linaro.org \
--cc=harisokn@amazon.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.