From: David Laight <david.laight.linux@gmail.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org,
ast@kernel.org, rafael@kernel.org, daniel.lezcano@linaro.org,
memxor@gmail.com, zhenglifeng1@huawei.com,
xueshuai@linux.alibaba.com, rdunlap@infradead.org,
joao.m.martins@oracle.com, boris.ostrovsky@oracle.com,
konrad.wilk@oracle.com, ashok.bhat@arm.com
Subject: Re: [PATCH v11 01/14] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Date: Thu, 7 May 2026 10:57:21 +0100 [thread overview]
Message-ID: <20260507105721.66ba1e45@pumpkin> (raw)
In-Reply-To: <87o6isl0nl.fsf@oracle.com>
On Wed, 06 May 2026 13:54:06 -0700
Ankur Arora <ankur.a.arora@oracle.com> wrote:
> David Laight <david.laight.linux@gmail.com> writes:
>
> > On Wed, 06 May 2026 00:30:29 -0700
> > Ankur Arora <ankur.a.arora@oracle.com> wrote:
> >
> >> Ankur Arora <ankur.a.arora@oracle.com> writes:
> >>
> >> > Add smp_cond_load_relaxed_timeout(), which extends
> >> > smp_cond_load_relaxed() to allow waiting for a duration.
> >> >
> >> > We loop around waiting for the condition variable to change while
> >> > peridically doing a time-check. The loop uses cpu_poll_relax() to slow
> >> > down the busy-wait, which, unless overridden by the architecture
> >> > code, amounts to a cpu_relax().
> >> >
> >> > Note that there are two ways for the time-check to fail: the timeout
> >> > case or, @time_expr_ns returning an invalid value (negative or zero).
> >> > The second failure mode allows for clocks attached to the clock-domain
> >> > of @cond_expr -- which might cease to operate meaningfully once some
> >> > state internal to @cond_expr has changed -- to fail.
> >> >
> >> > Evaluation of @time_expr_ns: in the fastpath we want to keep the
> >> > performance close to smp_cond_load_relaxed(). So defer evaluation
> >> > of the potentially costly @time_expr_ns to the slowpath.
> >> >
> >> > This also means that there will always be some hardware dependent
> >> > duration that has passed in cpu_poll_relax() iterations at the time
> >> > of first evaluation. Additionally cpu_poll_relax() is not guaranteed
> >> > to return at timeout boundary. In sum, expect timeout overshoot when
> >> > we exit due to expiration of the timeout.
> >> >
> >> > The number of spin iterations before time-check, SMP_TIMEOUT_POLL_COUNT
> >> > is chosen to be 200 by default. With a cpu_poll_relax() iteration
> >> > taking ~20-30 cycles (measured on a variety of x86 platforms), we
> >> > expect a time-check every ~4000-6000 cycles.
> >> >
> >> > The outer limit of the overshoot is double that when working with the
> >> > parameters above. This might be higher or lower depending on the
> >> > implementation of cpu_poll_relax() across architectures.
> >> >
> >> > Lastly, config option ARCH_HAS_CPU_RELAX indicates availability of a
> >> > cpu_poll_relax() that is cheaper than polling. This might be relevant
> >> > for cases with a long timeout.
> >> >
> >> > Cc: Arnd Bergmann <arnd@arndb.de>
> >> > Cc: Will Deacon <will@kernel.org>
> >> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> >> > Cc: Peter Zijlstra <peterz@infradead.org>
> >> > Cc: linux-arch@vger.kernel.org
> >> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> >> > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >> > ---
> >> > Notes:
> >> > - add a comment mentioning that smp_cond_load_relaxed_timeout() might
> >> > be using architectural primitives that don't support MMIO.
> >> > (David Laight, Catalin Marinas)
> >> >
> >> > include/asm-generic/barrier.h | 69 +++++++++++++++++++++++++++++++++++
> >> > 1 file changed, 69 insertions(+)
> >> >
> >> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> >> > index d4f581c1e21d..e5a6a1c04649 100644
> >> > --- a/include/asm-generic/barrier.h
> >> > +++ b/include/asm-generic/barrier.h
> >> > @@ -273,6 +273,75 @@ do { \
> >> > })
> >> > #endif
> >> >
> >> > +/*
> >> > + * Number of times we iterate in the loop before doing the time check.
> >> > + * Note that the iteration count assumes that the loop condition is
> >> > + * relatively cheap.
> >> > + */
> >> > +#ifndef SMP_TIMEOUT_POLL_COUNT
> >> > +#define SMP_TIMEOUT_POLL_COUNT 200
> >> > +#endif
> >> > +
> >> > +/*
> >> > + * Platforms with ARCH_HAS_CPU_RELAX have a cpu_poll_relax() implementation
> >> > + * that is expected to be cheaper (lower power) than pure polling.
> >> > + */
> >> > +#ifndef cpu_poll_relax
> >> > +#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
> >> > +#endif
> >> > +
> >> > +/**
> >> > + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
> >> > + * guarantees until a timeout expires.
> >> > + * @ptr: pointer to the variable to wait on.
> >> > + * @cond_expr: boolean expression to wait for.
> >> > + * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
> >> > + * on failure, returns a negative value.
> >> > + * @timeout_ns: timeout value in ns
> >> > + * Both of the above are assumed to be compatible with s64; the signed
> >> > + * value is used to handle the failure case in @time_expr_ns.
> >> > + *
> >> > + * Equivalent to using READ_ONCE() on the condition variable.
> >> > + *
> >> > + * Callers that expect to wait for prolonged durations might want
> >> > + * to take into account the availability of ARCH_HAS_CPU_RELAX.
> >> > + *
> >> > + * Note that @ptr is expected to point to a memory address. Using this
> >> > + * interface with MMIO will be slower (since SMP_TIMEOUT_POLL_COUNT is
> >> > + * tuned for memory) and might also break in interesting architecture
> >> > + * dependent ways.
> >> > + */
> >> > +#ifndef smp_cond_load_relaxed_timeout
> >> > +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
> >> > + time_expr_ns, timeout_ns) \
> >> > +({ \
> >> > + typeof(ptr) __PTR = (ptr); \
auto __PTR = ptr;
> >> > + __unqual_scalar_typeof(*ptr) VAL; \
It can't matter if integer promotions before assigning to VAL.
So something like:
auto VAL = 1 ? 0 : *__PTR + 0;
will generate a suitable writable variable.
(The '+ 0' is needed because some versions of gcc incorrectly propagate
'const'.)
> >> > + u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
> >> > + s64 __timeout = (s64)timeout_ns; \
The (s64) cast can only hide errors.
> >> > + s64 __time_now, __time_end = 0; \
> >> > + \
> >> > + for (;;) { \
> >> > + VAL = READ_ONCE(*__PTR); \
> >> > + if (cond_expr) \
> >> > + break; \
> >> > + cpu_poll_relax(__PTR, VAL, (u64)__timeout); \
That doesn't look right, __timeout is relative; if the underlying code
uses the timeout then the code delays for 200 * timeout_ns before even
looking at the actual time.
If you want to spin then you may not even want the cpu_relax() at all.
(Or with a very short timeout as in the version below.)
> >> > + if (++__n < __spin) \
> >> > + continue; \
> >> > + __time_now = (s64)(time_expr_ns); \
Another cast that can only hide bugs.
> >> > + if (unlikely(__time_end == 0)) \
> >> > + __time_end = __time_now + __timeout; \
> >> > + __timeout = __time_end - __time_now; \
> >> > + if (__time_now <= 0 || __timeout <= 0) { \
> >> > + VAL = READ_ONCE(*__PTR); \
> >> > + break; \
> >> > + } \
> >> > + __n = 0; \
Resetting the spin count doesn't look right at all.
In principle the code will delay for 200 * __timeout.
Possibly the earlier check should be:
if (__n < __spin) {
__n++;
continue;
}
(Or just don't worry that the code will spin again after 4M loops.
The problem you have is that if cpu_poll_relay() ignores the timeout you
probably want to spin 'for a bit' in code that only accesses local data
(in particular avoiding evaluating cond_expr or time_expr_ns).
> >> > + } \
> >> > + (typeof(*ptr))VAL; \
That cast is pointless; the return value will be subject to 'integer promotion'
and converted to a rvalue - which removes any const/volatile qualifiers.
> >> > +})
> >> > +#endif
> >> > +
> >>
> >> A cluster of issues that got flagged by sashiko was around timeout_ns
> >> being specified as s64 and a bunch of potential edge cases around
> >> that.
> >>
> >> These were mostly caused by an implicit assumption in the code that
> >> the timeout specified by the caller is generally reasonable. So, way
> >> below S64_MAX, not 0 etc.
> >
> > There are plenty of ways kernel code can break things.
> > Provided this code doesn't itself overwrite anywhere (rather than
> > just loop forever or return immediately etc) I'd be tempted to
> > just document the valid range rather than slow everything down
> > with the extra tests.
>
> I don't disagree. In this case, however, it's somewhat borderline.
>
> On the pro side, we get rid of some of the implicit type conversions
> and assumptions around those.
>
> On the negative, it adds an extra modulo operation in the slow path.
> And, the for loop is structured a little differently from the usual
> version.
>
> On balance, I think this is a good change if only because it makes
> the types a little more explicit.
>
> Ankur
>
> > David
> >
> >>
> >> I think this is worth cleaning up a bit. The change is mostly around
> >> introducing a u32 __itertime and explicitly computing the waiting time.
> >> And adding a check to ensure that we start with a valid value.
> >>
> >> This does make the implementation a little more involved. So just wanted
> >> to see if people have any opinions on this?
> >>
> >> +#ifndef smp_cond_load_relaxed_timeout
> >> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
> >> + time_expr_ns, timeout_ns) \
> >> +({ \
> >> + typeof(ptr) __PTR = (ptr); \
> >> + __unqual_scalar_typeof(*(ptr)) VAL; \
> >> + u32 __count = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
> >> + s64 __timeout = (s64)(timeout_ns); \
> >> + s64 __time_now, __time_end = 0; \
> >> + u32 __maybe_unused __itertime; \
> >> + \
> >> + for (__itertime = NSEC_PER_USEC; \
Ok, so that limits the initial 'spin' to 200 usecs.
That gets added to any caller-specified timeout.
> >> + VAL = READ_ONCE(*__PTR), __timeout > 0; ) { \
Broken indentation.
I'd change it back to a for (;;) loop.
If __timeout <= 0 then the code goes through the 'timer expired'
path (below) on the first iteration.
So the extra check is just bloat.
> >> + if (cond_expr) \
> >> + break; \
> >> + cpu_poll_relax(__PTR, VAL, __itertime); \
> >> + if (++__count < __spin) \
> >> + continue; \
> >> + __time_now = (s64)(time_expr_ns); \
> >> + if (unlikely(__time_end == 0)) \
> >> + __time_end = __time_now + __timeout; \
> >> + __timeout = __time_end - __time_now; \
> >> + if (__time_now <= 0 || __timeout <= 0) { \
> >> + VAL = READ_ONCE(*__PTR); \
> >> + break; \
> >> + } \
How about:
if (unlikely(__time_end == 0)) {
if (__time_now <= 0)
goto timed_out;
__time_end = __time_now + __timeout;
} else {
if (__time_now >= __time_end) {
timed_out:
VAL = READ_ONCE(*__PTR);
break;
}
__timeout = __time_end - __time_now;
}
> >> + __itertime = __timeout % NSEC_PER_MSEC + \
> >> + NSEC_PER_USEC; \
That seems to just be putting a bound on the timeout.
So the '% NSEC_PER_MSEC' could be '& ((1u << 20) - 1)'
replacing an expensive signed divide with a cheap mask.
But overall this is a lot of code to inline.
It has to be possible to get it down to something like:
struct info info = { .tmo_ns = timeout_ns };
for (;;) {
VAL = READ_ONCE(PTR);
if (cond_expr)
break;
if (_smp_cond_load_relaxed_timeout(&info, PTR, VAL))
break;
}
VAL;
(yes, I know it isn't that simple because the arm 'relax' code
has a re-read in it so needs to know the size.)
-- David
> >> + __count = 0; \
> >> + } \
> >> + (typeof(*(ptr)))VAL; \
> >> +})
> >> +#endif
> >>
> >> Thanks
> >>
> >> --
> >> ankur
> >>
>
>
> --
> ankur
>
next prev parent reply other threads:[~2026-05-07 9:57 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 12:25 [PATCH v11 00/14] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 01/14] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2026-05-06 7:30 ` Ankur Arora
2026-05-06 8:58 ` David Laight
2026-05-06 20:54 ` Ankur Arora
2026-05-07 9:57 ` David Laight [this message]
2026-05-08 6:31 ` Ankur Arora
2026-05-08 8:32 ` David Laight
2026-04-08 12:25 ` [PATCH v11 02/14] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 03/14] arm64/delay: move some constants out to a separate header Ankur Arora
2026-04-08 12:25 ` [PATCH v11 04/14] arm64: support WFET in smp_cond_load_relaxed_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 05/14] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 06/14] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 07/14] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 08/14] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 09/14] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2026-04-08 12:25 ` [PATCH v11 10/14] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 11/14] sched: add need-resched timed wait interface Ankur Arora
2026-04-08 12:25 ` [PATCH v11 12/14] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2026-04-20 16:57 ` Okanovic, Haris
2026-04-20 17:50 ` Ankur Arora
2026-04-21 7:15 ` Catalin Marinas
2026-04-20 22:12 ` Christoph Lameter (Ampere)
2026-04-08 12:25 ` [PATCH v11 13/14] kunit: enable testing smp_cond_load_relaxed_timeout() Ankur Arora
2026-04-08 12:25 ` [PATCH v11 14/14] kunit: add tests for smp_cond_load_relaxed_timeout() Ankur Arora
2026-04-23 17:16 ` [PATCH v11 00/14] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Andrew Morton
2026-04-23 19:29 ` Ankur Arora
2026-04-24 14:16 ` [PATCH v11 00/14] barrier: Add smp_cond_load_{relaxed, acquire}_timeout() Okanovic, Haris
2026-04-24 14:10 ` Okanovic, Haris
2026-04-24 14:28 ` [PATCH v11 00/14] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Andrew Morton
2026-04-24 18:10 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260507105721.66ba1e45@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ankur.a.arora@oracle.com \
--cc=arnd@arndb.de \
--cc=ashok.bhat@arm.com \
--cc=ast@kernel.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=daniel.lezcano@linaro.org \
--cc=harisokn@amazon.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox