From: Ankur Arora <ankur.a.arora@oracle.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-pm@vger.kernel.org,
bpf@vger.kernel.org, arnd@arndb.de, catalin.marinas@arm.com,
will@kernel.org, peterz@infradead.org, akpm@linux-foundation.org,
mark.rutland@arm.com, harisokn@amazon.com, cl@gentwo.org,
ast@kernel.org, rafael@kernel.org, daniel.lezcano@linaro.org,
memxor@gmail.com, zhenglifeng1@huawei.com,
xueshuai@linux.alibaba.com, joao.m.martins@oracle.com,
boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subject: Re: [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
Date: Sun, 21 Dec 2025 23:47:09 -0800 [thread overview]
Message-ID: <87cy47t0ua.fsf@oracle.com> (raw)
In-Reply-To: <20251215044919.460086-2-ankur.a.arora@oracle.com>
Ankur Arora <ankur.a.arora@oracle.com> writes:
> Add smp_cond_load_relaxed_timeout(), which extends
> smp_cond_load_relaxed() to allow waiting for a duration.
>
> We loop around waiting for the condition variable to change while
> peridically doing a time-check. The loop uses cpu_poll_relax()
> to slow down the busy-waiting, which, unless overridden by the
> architecture code, amounts to a cpu_relax().
>
> Note that there are two ways for the time-check to fail: once we have
> timed out or, if the time_expr_ns returns an invalid value (negative
> or zero).
>
> The number of times we spin before doing the time-check is specified
> by SMP_TIMEOUT_POLL_COUNT (chosen to be 200 by default) which,
> assuming each cpu_poll_relax() iteration takes ~20-30 cycles (measured
> on a variety of x86 platforms), for a total of ~4000-6000 cycles.
>
> That is also the outer limit of the overshoot when working with the
> parameters above. This might be higher or lesser depending on the
> implementation of cpu_poll_relax() across architectures.
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-arch@vger.kernel.org
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>
> Notes:
>
> - the interface now breaks the time_check_expr into two parts:
> time_expr_ns (evaluates to current time) and remaining_ns. The main
> reason for doing this was to support WFET and similar primitives
> which can do timed waiting.
>
> - cpu_poll_relax() now takes an additional paramater to handle that.
>
> - time_expr_ns can now return failure which needs a little more change
> in organization. This was needed because rqspinlock check_timeout()
> logic mapped naturally to the unified check in time_check_expr.
> Breaking up the time_check_expr, however needed check_timeout() to
> separate a clock interface (which could fail on deadlock or its
> internal timeout check) and the timeout duration.
>
> - given the changes in logic I've removed Catalin and Haris' R-by and
> Tested-by.
>
> include/asm-generic/barrier.h | 58 +++++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index d4f581c1e21d..e25592f9fcbf 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -273,6 +273,64 @@ do { \
> })
> #endif
>
> +/*
> + * Number of times we iterate in the loop before doing the time check.
> + */
> +#ifndef SMP_TIMEOUT_POLL_COUNT
> +#define SMP_TIMEOUT_POLL_COUNT 200
> +#endif
> +
> +#ifndef cpu_poll_relax
> +#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
> +#endif
> +
> +/**
> + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
> + * guarantees until a timeout expires.
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
> + * on failure, returns a negative value.
> + * @timeout_ns: timeout value in ns
> + * (Both of the above are assumed to be compatible with s64.)
> + *
> + * Equivalent to using READ_ONCE() on the condition variable.
> + */
> +#ifndef smp_cond_load_relaxed_timeout
> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
> + time_expr_ns, timeout_ns) \
> +({ \
> + __label__ __out, __done; \
> + typeof(ptr) __PTR = (ptr); \
> + __unqual_scalar_typeof(*ptr) VAL; \
> + u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
> + s64 __time_now = (s64)(time_expr_ns); \
> + s64 __timeout = (s64)timeout_ns; \
> + s64 __time_end = __time_now + __timeout; \
> + \
> + if (__time_now <= 0) \
> + goto __out; \
> + \
> + for (;;) { \
> + VAL = READ_ONCE(*__PTR); \
> + if (cond_expr) \
> + goto __done; \
> + cpu_poll_relax(__PTR, VAL, __timeout); \
> + if (++__n < __spin) \
> + continue; \
> + __time_now = (s64)(time_expr_ns); \
> + __timeout = __time_end - __time_now; \
> + if (__time_now <= 0 || __timeout <= 0) \
> + goto __out; \
> + __n = 0; \
> + } \
> +__out: \
> + VAL = READ_ONCE(*__PTR); \
> +__done: \
> + (typeof(*ptr))VAL; \
> +})
> +#endif
> +
> /*
> * pmem_wmb() ensures that all stores for which the modification
> * are written to persistent storage by preceding instructions have
Alexei Starovoitov pointed out in his review comment on patch-10 that
evaluating time_expr_ns on entry means that this is unusable in the
fastpath of something like lock acquisition (as rqspinlock does).
Something like the following should work better (it also gets rid of the
gotos.) The only negative is that in the slowpath this would have a
larger overshoot but I think that's a reasonable tradeoff.
I'll be sending out the next version with this changed but just wanted to
gauge any preliminary opinions on this.
Thanks
Ankur
---
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d4f581c1e21d..278eb51cf354 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -273,6 +273,73 @@ do { \
})
#endif
+/*
+ * Number of times we iterate in the loop before doing the time check.
+ */
+#ifndef SMP_TIMEOUT_POLL_COUNT
+#define SMP_TIMEOUT_POLL_COUNT 200
+#endif
+
+#ifndef cpu_poll_relax
+#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
+#endif
+
+/**
+ * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
+ * guarantees until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
+ * on failure, returns a negative value.
+ * @timeout_ns: timeout value in ns
+ * (Both of the above are assumed to be compatible with s64.)
+ *
+ * Equivalent to using READ_ONCE() on the condition variable.
+ *
+ * Note on timeout overshoot: in the fastpath, we want to keep this as close
+ * to performance as smp_cond_load_relaxed(). To do that we want to defer
+ * the evaluation of the potentially expensive @time_expr_ns to the
+ * earliest time that is needed.
+ *
+ * This means that there will always be some hardware dependent duration
+ * that has already passed inside cpu_poll_relax() at the time of first
+ * evaluation. In addition, cpu_poll_relax() is not guaranteed to return
+ * at timeout boundary.
+ *
+ * In sum, expect timeout overshoot when we return due to expiration of
+ * the timeout.
+ */
+#ifndef smp_cond_load_relaxed_timeout
+#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
+ time_expr_ns, timeout_ns) \
+({ \
+ typeof(ptr) __PTR = (ptr); \
+ __unqual_scalar_typeof(*ptr) VAL; \
+ u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
+ s64 __timeout = (s64)timeout_ns; \
+ s64 __time_now, __time_end = 0; \
+ \
+ for (;;) { \
+ VAL = READ_ONCE(*__PTR); \
+ if (cond_expr) \
+ break; \
+ cpu_poll_relax(__PTR, VAL, __timeout); \
+ if (++__n < __spin) \
+ continue; \
+ __time_now = (s64)(time_expr_ns); \
+ if (unlikely(__time_end == 0)) \
+ __time_end = __time_now + __timeout; \
+ __timeout = __time_end - __time_now; \
+ if (__time_now <= 0 || __timeout <= 0) { \
+ VAL = READ_ONCE(*__PTR); \
+ break; \
+ } \
+ __n = 0; \
+ } \
+ (typeof(*ptr))VAL; \
+})
+#endif
+
/*
* pmem_wmb() ensures that all stores for which the modification
* are written to persistent storage by preceding instructions have
next prev parent reply other threads:[~2025-12-22 7:47 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-12-22 7:47 ` Ankur Arora [this message]
2025-12-15 4:49 ` [PATCH v8 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
2025-12-15 17:59 ` kernel test robot
2025-12-19 4:52 ` kernel test robot
2025-12-15 4:49 ` [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2025-12-15 5:14 ` bot+bpf-ci
2025-12-15 4:49 ` [PATCH v8 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 07/12] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2025-12-15 4:49 ` [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2025-12-15 21:40 ` Alexei Starovoitov
2025-12-16 7:34 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 11/12] sched: add need-resched timed wait interface Ankur Arora
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2025-12-15 11:11 ` Rafael J. Wysocki
2025-12-16 6:55 ` Ankur Arora
2025-12-15 21:23 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cy47t0ua.fsf@oracle.com \
--to=ankur.a.arora@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=arnd@arndb.de \
--cc=ast@kernel.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=daniel.lezcano@linaro.org \
--cc=harisokn@amazon.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=memxor@gmail.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
--cc=zhenglifeng1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).