From: Ankur Arora <ankur.a.arora@oracle.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
linux-kernel@vger.kernel.org,
Linux-Arch <linux-arch@vger.kernel.org>,
linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
bpf@vger.kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mark Rutland <mark.rutland@arm.com>,
Haris Okanovic <harisokn@amazon.com>,
"Christoph Lameter (Ampere)" <cl@gentwo.org>,
Alexei Starovoitov <ast@kernel.org>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
zhenglifeng1@huawei.com, xueshuai@linux.alibaba.com,
Joao Martins <joao.m.martins@oracle.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Date: Mon, 03 Nov 2025 13:41:04 -0800 [thread overview]
Message-ID: <87a512eqvj.fsf@oracle.com> (raw)
In-Reply-To: <84914748-4bf9-44a5-9e08-80528ca27177@app.fastmail.com>
Arnd Bergmann <arnd@arndb.de> writes:
> On Wed, Oct 29, 2025, at 04:17, Ankur Arora wrote:
>> Arnd Bergmann <arnd@arndb.de> writes:
>>> On Tue, Oct 28, 2025, at 06:31, Ankur Arora wrote:
>>> The FEAT_WFXT version would then look something like
>>>
>>> static inline void __cmpwait_u64_timeout(volatile u64 *ptr, unsigned long val, __u64 ns)
>>> {
>>> unsigned long tmp;
>>> asm volatile ("sev; wfe; ldxr; eor; cbnz; wfet; 1:"
>>> : "=&r" (tmp), "+Q" (*ptr)
>>> : "r" (val), "r" (ns));
>>> }
>>> #define cpu_poll_relax_timeout_wfet(__PTR, VAL, TIMECHECK) \
>>> ({ \
>>> u64 __t = TIMECHECK;
>>> if (__t)
>>> __cmpwait_u64_timeout(__PTR, VAL, __t);
>>> })
>>>
>>> while the 'wfe' version would continue to do the timecheck after the
>>> wait.
>>
>> I think this is a good way to do it if we need the precision
>> at some point in the future.
>
> I'm sorry I wrote this in a confusing way, but I really don't
> care about precision here, but for power savings reasons I
> would like to use a wfe/wfet based wait here (like you do)
> while at the same time being able to turn off the event
> stream entirely as long as FEAR_WFXT is available, saving
> additional power.
>
>>> I have two lesser concerns with the generic definition here:
>>>
>>> - having both a timeout and a spin counter in the same loop
>>> feels redundant and error-prone, as the behavior in practice
>>> would likely depend a lot on the platform. What is the reason
>>> for keeping the counter if we already have a fixed timeout
>>> condition?
>>
>> The main reason was that the time check is expensive in power terms.
>> Which is fine for platforms with a WFE like primitive but others
>> want to do the time check only infrequently. That's why poll_idle()
>> introduced a rate limit on polling (which the generic definition
>> reused here.)
>
> Right, I misunderstood how this works, so this part is fine.
>
>>> - I generally dislike the type-agnostic macros like this one,
>>> it adds a lot of extra complexity here that I feel can be
>>> completely avoided if we make explicitly 32-bit and 64-bit
>>> wide versions of these macros. We probably won't be able
>>> to resolve this as part of your series, but ideally I'd like
>>> have explicitly-typed versions of cmpxchg(), smp_load_acquire()
>>> and all the related ones, the same way we do for atomic_*()
>>> and atomic64_*().
>>
>> Ah. And the caller uses say smp_load_acquire_long() or whatever, and
>> that resolves to whatever makes sense for the arch.
>>
>> The __unqual_scalar_typeof() does look pretty ugly when looking at the
>> preprocesed version but other than that smp_cond_load() etc look
>> pretty straight forward. Just for my curiousity could you elaborate on
>> the complexity?
>
> The nested macros with __unqual_scalar_typeof() make the
> preprocessed version completely unreadable, especially when
> combined with other sets of complex macros like min()/max(),
> tracepoints or arm64 atomics.
>
> If something goes wrong inside of it, it becomes rather
> hard for people to debug or even read the compiler warnings.
> Since I do a lot of build testing, I do tend to run into those
> more than others do.
>
> I've also seen cases where excessively complex macros get
> nested to the point of slowing down the kernel build, which
> can happen once the nesting expands to megabytes of source
> code.
Thanks for the detail. Yeah, looking at the preprocessed code (as I had
to do when working on this) it all really feels quite excessive.
Combined with other complex macro constructs ...
--
ankur
next prev parent reply other threads:[~2025-11-03 21:41 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 5:31 [RESEND PATCH v7 0/7] barrier: Add smp_cond_load_*_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 1/7] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 9:42 ` Arnd Bergmann
2025-10-29 3:17 ` Ankur Arora
2025-11-02 21:52 ` Arnd Bergmann
2025-11-03 21:41 ` Ankur Arora [this message]
2025-10-28 16:13 ` Christoph Lameter (Ampere)
2025-10-28 5:31 ` [RESEND PATCH v7 2/7] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 8:42 ` Arnd Bergmann
2025-10-28 16:21 ` Christoph Lameter (Ampere)
2025-10-28 18:01 ` Ankur Arora
2025-10-28 21:17 ` Catalin Marinas
2025-11-02 21:39 ` Arnd Bergmann
2025-11-03 21:00 ` Ankur Arora
2025-11-04 13:55 ` Catalin Marinas
2025-11-05 8:27 ` Ankur Arora
2025-11-05 10:37 ` Arnd Bergmann
2025-11-06 0:36 ` Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 3/7] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 4/7] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 5/7] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 6/7] rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2025-10-28 5:31 ` [RESEND PATCH v7 7/7] cpuidle/poll_state: Poll via smp_cond_load_relaxed_timeout() Ankur Arora
2025-10-28 12:30 ` Rafael J. Wysocki
2025-10-29 4:41 ` Ankur Arora
2025-10-29 18:53 ` Rafael J. Wysocki
2025-10-29 19:13 ` Ankur Arora
2025-10-29 20:29 ` Rafael J. Wysocki
2025-10-29 21:01 ` Ankur Arora
2025-11-04 18:07 ` Rafael J. Wysocki
2025-11-05 8:30 ` Ankur Arora
2025-10-28 16:16 ` Christoph Lameter (Ampere)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a512eqvj.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=ast@kernel.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=daniel.lezcano@linaro.org \
--cc=harisokn@amazon.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).