* [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
@ 2025-12-15 4:49 Ankur Arora
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
` (11 more replies)
0 siblings, 12 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
This series adds waited variants of the smp_cond_load() primitives:
smp_cond_load_relaxed_timeout(), and smp_cond_load_acquire_timeout().
As the name suggests, the new interfaces are meant for contexts where
you want to wait on a condition variable for a finite duration. This is
easy enough to do with a loop around cpu_relax(). There are, however,
architectures (ex. arm64) that allow waiting on a cacheline instead.
So, these interfaces handle a mixture of spin/wait with a
smp_cond_load() thrown in. The interfaces are:
smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
The parameters, time_expr, timeout determine when to bail out.
Also add tif_need_resched_relaxed_wait() which wraps the pattern used
in poll_idle() and abstracts out details of the interface and those
of the scheduler.
In addition add atomic_cond_read_*_timeout(), atomic64_cond_read_*_timeout(),
and atomic_long wrappers to the interfaces.
Finally update poll_idle() and resilient queued spinlocks to use them.
Changelog:
v7 [0]:
- change the interface to separately provide the timeout. This is
useful for supporting WFET and similar primitives which can do
timed waiting (suggested by Arnd Bergmann).
- Adapting rqspinlock code to this changed interface also
necessitated allowing time_expr to fail.
- rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
- add WFET support (suggested by Arnd Bergmann).
- add support for atomic-long wrappers.
- add a new scheduler interface tif_need_resched_relaxed_wait() which
encapsulates the polling logic used by poll_idle().
- interface suggested by (Rafael J. Wysocki).
v6 [1]:
- fixup missing timeout parameters in atomic64_cond_read_*_timeout()
- remove a race between setting of TIF_NEED_RESCHED and the call to
smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
would be set even if we hadn't spent any time waiting.
(The original check compared against local_clock(), which would have been
fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
(Both from meta-CI bot)
v5 [2]:
- use cpu_poll_relax() instead of cpu_relax().
- instead of defining an arm64 specific
smp_cond_load_relaxed_timeout(), just define the appropriate
cpu_poll_relax().
- re-read the target pointer when we exit due to the time-check.
- s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
(Suggested by Will Deacon)
- add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
interfaces.
- rqspinlock: use atomic_cond_read_acquire_timeout().
- cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
(Suggested by Catalin Marinas)
- rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
v4 [3]:
- naming change 's/timewait/timeout/'
- resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
and fixup use of RES_CHECK_TIMEOUT().
(Both suggested by Catalin Marinas)
v3 [4]:
- further interface simplifications (suggested by Catalin Marinas)
v2 [5]:
- simplified the interface (suggested by Catalin Marinas)
- get rid of wait_policy, and a multitude of constants
- adds a slack parameter
This helped remove a fair amount of duplicated code duplication and in
hindsight unnecessary constants.
v1 [6]:
- add wait_policy (coarse and fine)
- derive spin-count etc at runtime instead of using arbitrary
constants.
Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [7]
Comments appreciated!
Thanks
Ankur
[0] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
[1] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
[2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
[3] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
[4] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
[5] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
[6] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
[7] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pm@vger.kernel.org
Ankur Arora (12):
asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
arm64: barrier: Support smp_cond_load_relaxed_timeout()
arm64/delay: move some constants out to a separate header
arm64: support WFET in smp_cond_relaxed_timeout()
arm64: rqspinlock: Remove private copy of
smp_cond_load_acquire_timewait()
asm-generic: barrier: Add smp_cond_load_acquire_timeout()
atomic: Add atomic_cond_read_*_timeout()
locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
bpf/rqspinlock: switch check_timeout() to a clock interface
bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
sched: add need-resched timed wait interface
cpuidle/poll_state: Wait for need-resched via
tif_need_resched_relaxed_wait()
arch/arm64/include/asm/barrier.h | 23 ++++++++
arch/arm64/include/asm/cmpxchg.h | 72 ++++++++++++++++-------
arch/arm64/include/asm/delay-const.h | 25 ++++++++
arch/arm64/include/asm/rqspinlock.h | 85 ----------------------------
arch/arm64/lib/delay.c | 13 +----
drivers/cpuidle/poll_state.c | 27 ++-------
drivers/soc/qcom/rpmh-rsc.c | 9 +--
include/asm-generic/barrier.h | 84 +++++++++++++++++++++++++++
include/linux/atomic.h | 10 ++++
include/linux/atomic/atomic-long.h | 18 +++---
include/linux/sched/idle.h | 29 ++++++++++
kernel/bpf/rqspinlock.c | 72 ++++++++++++++---------
scripts/atomic/gen-atomic-long.sh | 16 ++++--
13 files changed, 295 insertions(+), 188 deletions(-)
create mode 100644 arch/arm64/include/asm/delay-const.h
--
2.31.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-22 7:47 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
` (10 subsequent siblings)
11 siblings, 1 reply; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
Add smp_cond_load_relaxed_timeout(), which extends
smp_cond_load_relaxed() to allow waiting for a duration.
We loop around waiting for the condition variable to change while
peridically doing a time-check. The loop uses cpu_poll_relax()
to slow down the busy-waiting, which, unless overridden by the
architecture code, amounts to a cpu_relax().
Note that there are two ways for the time-check to fail: once we have
timed out or, if the time_expr_ns returns an invalid value (negative
or zero).
The number of times we spin before doing the time-check is specified
by SMP_TIMEOUT_POLL_COUNT (chosen to be 200 by default) which,
assuming each cpu_poll_relax() iteration takes ~20-30 cycles (measured
on a variety of x86 platforms), for a total of ~4000-6000 cycles.
That is also the outer limit of the overshoot when working with the
parameters above. This might be higher or lesser depending on the
implementation of cpu_poll_relax() across architectures.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
Notes:
- the interface now breaks the time_check_expr into two parts:
time_expr_ns (evaluates to current time) and remaining_ns. The main
reason for doing this was to support WFET and similar primitives
which can do timed waiting.
- cpu_poll_relax() now takes an additional paramater to handle that.
- time_expr_ns can now return failure which needs a little more change
in organization. This was needed because rqspinlock check_timeout()
logic mapped naturally to the unified check in time_check_expr.
Breaking up the time_check_expr, however needed check_timeout() to
separate a clock interface (which could fail on deadlock or its
internal timeout check) and the timeout duration.
- given the changes in logic I've removed Catalin and Haris' R-by and
Tested-by.
include/asm-generic/barrier.h | 58 +++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d4f581c1e21d..e25592f9fcbf 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -273,6 +273,64 @@ do { \
})
#endif
+/*
+ * Number of times we iterate in the loop before doing the time check.
+ */
+#ifndef SMP_TIMEOUT_POLL_COUNT
+#define SMP_TIMEOUT_POLL_COUNT 200
+#endif
+
+#ifndef cpu_poll_relax
+#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
+#endif
+
+/**
+ * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
+ * guarantees until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
+ * on failure, returns a negative value.
+ * @timeout_ns: timeout value in ns
+ * (Both of the above are assumed to be compatible with s64.)
+ *
+ * Equivalent to using READ_ONCE() on the condition variable.
+ */
+#ifndef smp_cond_load_relaxed_timeout
+#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
+ time_expr_ns, timeout_ns) \
+({ \
+ __label__ __out, __done; \
+ typeof(ptr) __PTR = (ptr); \
+ __unqual_scalar_typeof(*ptr) VAL; \
+ u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
+ s64 __time_now = (s64)(time_expr_ns); \
+ s64 __timeout = (s64)timeout_ns; \
+ s64 __time_end = __time_now + __timeout; \
+ \
+ if (__time_now <= 0) \
+ goto __out; \
+ \
+ for (;;) { \
+ VAL = READ_ONCE(*__PTR); \
+ if (cond_expr) \
+ goto __done; \
+ cpu_poll_relax(__PTR, VAL, __timeout); \
+ if (++__n < __spin) \
+ continue; \
+ __time_now = (s64)(time_expr_ns); \
+ __timeout = __time_end - __time_now; \
+ if (__time_now <= 0 || __timeout <= 0) \
+ goto __out; \
+ __n = 0; \
+ } \
+__out: \
+ VAL = READ_ONCE(*__PTR); \
+__done: \
+ (typeof(*ptr))VAL; \
+})
+#endif
+
/*
* pmem_wmb() ensures that all stores for which the modification
* are written to persistent storage by preceding instructions have
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
` (9 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
Support waiting in smp_cond_load_relaxed_timeout() via
__cmpwait_relaxed(). To ensure that we wake from waiting in
WFE periodically and don't block forever if there are no stores
to ptr, this path is only used when the event-stream is enabled.
Note that when using __cmpwait_relaxed() we ignore the timeout
value, allowing an overshoot by upto the event-stream period.
And, in the unlikely event that the event-stream is unavailable,
fallback to spin-waiting.
Also set SMP_TIMEOUT_POLL_COUNT to 1 so we do the time-check in
each iteration of smp_cond_load_relaxed_timeout().
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
Notes:
- cpu_poll_relax() now takes an additional parameter.
- added a comment detailing why we define SMP_TIMEOUT_POLL_COUNT=1 and
how it ties up with smp_cond_load_relaxed_timeout().
- explicitly include <asm/vdso/processor.h> for cpu_relax().
arch/arm64/include/asm/barrier.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 9495c4441a46..6190e178db51 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -12,6 +12,7 @@
#include <linux/kasan-checks.h>
#include <asm/alternative-macros.h>
+#include <asm/vdso/processor.h>
#define __nops(n) ".rept " #n "\nnop\n.endr\n"
#define nops(n) asm volatile(__nops(n))
@@ -219,6 +220,26 @@ do { \
(typeof(*ptr))VAL; \
})
+/* Re-declared here to avoid include dependency. */
+extern bool arch_timer_evtstrm_available(void);
+
+/*
+ * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()
+ * for the ptr value to change.
+ *
+ * Since this period is reasonably long, choose SMP_TIMEOUT_POLL_COUNT
+ * to be 1, so smp_cond_load_{relaxed,acquire}_timeout() does a
+ * time-check in each iteration.
+ */
+#define SMP_TIMEOUT_POLL_COUNT 1
+
+#define cpu_poll_relax(ptr, val, timeout_ns) do { \
+ if (arch_timer_evtstrm_available()) \
+ __cmpwait_relaxed(ptr, val); \
+ else \
+ cpu_relax(); \
+} while (0)
+
#include <asm-generic/barrier.h>
#endif /* __ASSEMBLER__ */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 03/12] arm64/delay: move some constants out to a separate header
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 17:59 ` kernel test robot
2025-12-19 4:52 ` kernel test robot
2025-12-15 4:49 ` [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout() Ankur Arora
` (8 subsequent siblings)
11 siblings, 2 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora,
Bjorn Andersson, Konrad Dybcio, Christoph Lameter
Moves some constants and functions related to xloops, cycles computation
out to a new header.
No functional change.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Konrad Dybcio <konradybcio@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
---
arch/arm64/include/asm/delay-const.h | 25 +++++++++++++++++++++++++
arch/arm64/lib/delay.c | 13 +++----------
drivers/soc/qcom/rpmh-rsc.c | 9 +--------
3 files changed, 29 insertions(+), 18 deletions(-)
create mode 100644 arch/arm64/include/asm/delay-const.h
diff --git a/arch/arm64/include/asm/delay-const.h b/arch/arm64/include/asm/delay-const.h
new file mode 100644
index 000000000000..63fb5fc24a90
--- /dev/null
+++ b/arch/arm64/include/asm/delay-const.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_DELAY_CONST_H
+#define _ASM_DELAY_CONST_H
+
+#include <asm/param.h> /* For HZ */
+
+/* 2**32 / 1000000 (rounded up) */
+#define __usecs_to_xloops_mult 0x10C7UL
+
+/* 2**32 / 1000000000 (rounded up) */
+#define __nsecs_to_xloops_mult 0x5UL
+
+extern unsigned long loops_per_jiffy;
+static inline unsigned long xloops_to_cycles(unsigned long xloops)
+{
+ return (xloops * loops_per_jiffy * HZ) >> 32;
+}
+
+#define USECS_TO_CYCLES(time_usecs) \
+ xloops_to_cycles((time_usecs) * __usecs_to_xloops_mult)
+
+#define NSECS_TO_CYCLES(time_nsecs) \
+ xloops_to_cycles((time_nsecs) * __nsecs_to_xloops_mult)
+
+#endif /* _ASM_DELAY_CONST_H */
diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
index cb2062e7e234..511b5597e2a5 100644
--- a/arch/arm64/lib/delay.c
+++ b/arch/arm64/lib/delay.c
@@ -12,17 +12,10 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/timex.h>
+#include <asm/delay-const.h>
#include <clocksource/arm_arch_timer.h>
-#define USECS_TO_CYCLES(time_usecs) \
- xloops_to_cycles((time_usecs) * 0x10C7UL)
-
-static inline unsigned long xloops_to_cycles(unsigned long xloops)
-{
- return (xloops * loops_per_jiffy * HZ) >> 32;
-}
-
void __delay(unsigned long cycles)
{
cycles_t start = get_cycles();
@@ -58,12 +51,12 @@ EXPORT_SYMBOL(__const_udelay);
void __udelay(unsigned long usecs)
{
- __const_udelay(usecs * 0x10C7UL); /* 2**32 / 1000000 (rounded up) */
+ __const_udelay(usecs * __usecs_to_xloops_mult);
}
EXPORT_SYMBOL(__udelay);
void __ndelay(unsigned long nsecs)
{
- __const_udelay(nsecs * 0x5UL); /* 2**32 / 1000000000 (rounded up) */
+ __const_udelay(nsecs * __nsecs_to_xloops_mult);
}
EXPORT_SYMBOL(__ndelay);
diff --git a/drivers/soc/qcom/rpmh-rsc.c b/drivers/soc/qcom/rpmh-rsc.c
index c6f7d5c9c493..95962fc37295 100644
--- a/drivers/soc/qcom/rpmh-rsc.c
+++ b/drivers/soc/qcom/rpmh-rsc.c
@@ -26,6 +26,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
+#include <asm/delay-const.h>
#include <clocksource/arm_arch_timer.h>
#include <soc/qcom/cmd-db.h>
@@ -146,14 +147,6 @@ enum {
* +---------------------------------------------------+
*/
-#define USECS_TO_CYCLES(time_usecs) \
- xloops_to_cycles((time_usecs) * 0x10C7UL)
-
-static inline unsigned long xloops_to_cycles(u64 xloops)
-{
- return (xloops * loops_per_jiffy * HZ) >> 32;
-}
-
static u32 rpmh_rsc_reg_offset_ver_2_7[] = {
[RSC_DRV_TCS_OFFSET] = 672,
[RSC_DRV_CMD_OFFSET] = 20,
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (2 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
` (7 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
Extend __cmpwait_relaxed() to __cmpwait_relaxed_timeout() which takes
an additional timeout value in ns.
Lacking WFET, or with zero or negative value of timeout we fallback
to WFE.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/include/asm/barrier.h | 8 ++--
arch/arm64/include/asm/cmpxchg.h | 72 ++++++++++++++++++++++----------
2 files changed, 55 insertions(+), 25 deletions(-)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 6190e178db51..fbd71cd4ef4e 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -224,8 +224,8 @@ do { \
extern bool arch_timer_evtstrm_available(void);
/*
- * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()
- * for the ptr value to change.
+ * In the common case, cpu_poll_relax() sits waiting in __cmpwait_relaxed()/
+ * __cmpwait_relaxed_timeout() for the ptr value to change.
*
* Since this period is reasonably long, choose SMP_TIMEOUT_POLL_COUNT
* to be 1, so smp_cond_load_{relaxed,acquire}_timeout() does a
@@ -234,7 +234,9 @@ extern bool arch_timer_evtstrm_available(void);
#define SMP_TIMEOUT_POLL_COUNT 1
#define cpu_poll_relax(ptr, val, timeout_ns) do { \
- if (arch_timer_evtstrm_available()) \
+ if (alternative_has_cap_unlikely(ARM64_HAS_WFXT)) \
+ __cmpwait_relaxed_timeout(ptr, val, timeout_ns); \
+ else if (arch_timer_evtstrm_available()) \
__cmpwait_relaxed(ptr, val); \
else \
cpu_relax(); \
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index d7a540736741..acd01a203b62 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -12,6 +12,7 @@
#include <asm/barrier.h>
#include <asm/lse.h>
+#include <asm/delay-const.h>
/*
* We need separate acquire parameters for ll/sc and lse, since the full
@@ -208,22 +209,41 @@ __CMPXCHG_GEN(_mb)
__cmpxchg128((ptr), (o), (n)); \
})
-#define __CMPWAIT_CASE(w, sfx, sz) \
-static inline void __cmpwait_case_##sz(volatile void *ptr, \
- unsigned long val) \
-{ \
- unsigned long tmp; \
- \
- asm volatile( \
- " sevl\n" \
- " wfe\n" \
- " ldxr" #sfx "\t%" #w "[tmp], %[v]\n" \
- " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \
- " cbnz %" #w "[tmp], 1f\n" \
- " wfe\n" \
- "1:" \
- : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \
- : [val] "r" (val)); \
+/* Re-declared here to avoid include dependency. */
+extern u64 (*arch_timer_read_counter)(void);
+
+#define __CMPWAIT_CASE(w, sfx, sz) \
+static inline void __cmpwait_case_##sz(volatile void *ptr, \
+ unsigned long val, \
+ s64 timeout_ns) \
+{ \
+ unsigned long tmp; \
+ \
+ if (!alternative_has_cap_unlikely(ARM64_HAS_WFXT) || timeout_ns <= 0) { \
+ asm volatile( \
+ " sevl\n" \
+ " wfe\n" \
+ " ldxr" #sfx "\t%" #w "[tmp], %[v]\n" \
+ " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \
+ " cbnz %" #w "[tmp], 1f\n" \
+ " wfe\n" \
+ "1:" \
+ : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \
+ : [val] "r" (val)); \
+ } else { \
+ u64 ecycles = arch_timer_read_counter() + \
+ NSECS_TO_CYCLES(timeout_ns); \
+ asm volatile( \
+ " sevl\n" \
+ " wfe\n" \
+ " ldxr" #sfx "\t%" #w "[tmp], %[v]\n" \
+ " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \
+ " cbnz %" #w "[tmp], 2f\n" \
+ " msr s0_3_c1_c0_0, %[ecycles]\n" \
+ "2:" \
+ : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \
+ : [val] "r" (val), [ecycles] "r" (ecycles)); \
+ } \
}
__CMPWAIT_CASE(w, b, 8);
@@ -236,17 +256,22 @@ __CMPWAIT_CASE( , , 64);
#define __CMPWAIT_GEN(sfx) \
static __always_inline void __cmpwait##sfx(volatile void *ptr, \
unsigned long val, \
+ s64 timeout_ns, \
int size) \
{ \
switch (size) { \
case 1: \
- return __cmpwait_case##sfx##_8(ptr, (u8)val); \
+ return __cmpwait_case##sfx##_8(ptr, (u8)val, \
+ timeout_ns); \
case 2: \
- return __cmpwait_case##sfx##_16(ptr, (u16)val); \
+ return __cmpwait_case##sfx##_16(ptr, (u16)val, \
+ timeout_ns); \
case 4: \
- return __cmpwait_case##sfx##_32(ptr, val); \
+ return __cmpwait_case##sfx##_32(ptr, val, \
+ timeout_ns); \
case 8: \
- return __cmpwait_case##sfx##_64(ptr, val); \
+ return __cmpwait_case##sfx##_64(ptr, val, \
+ timeout_ns); \
default: \
BUILD_BUG(); \
} \
@@ -258,7 +283,10 @@ __CMPWAIT_GEN()
#undef __CMPWAIT_GEN
-#define __cmpwait_relaxed(ptr, val) \
- __cmpwait((ptr), (unsigned long)(val), sizeof(*(ptr)))
+#define __cmpwait_relaxed_timeout(ptr, val, timeout_ns) \
+ __cmpwait((ptr), (unsigned long)(val), timeout_ns, sizeof(*(ptr)))
+
+#define __cmpwait_relaxed(ptr, val) \
+ __cmpwait_relaxed_timeout(ptr, val, 0)
#endif /* __ASM_CMPXCHG_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (3 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 5:14 ` bot+bpf-ci
2025-12-15 4:49 ` [PATCH v8 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
` (6 subsequent siblings)
11 siblings, 1 reply; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
In preparation for defining smp_cond_load_acquire_timeout(), remove
the private copy. Lacking this, the rqspinlock code falls back to using
smp_cond_load_acquire().
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Haris Okanovic <harisokn@amazon.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/arm64/include/asm/rqspinlock.h | 85 -----------------------------
1 file changed, 85 deletions(-)
diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
index 9ea0a74e5892..a385603436e9 100644
--- a/arch/arm64/include/asm/rqspinlock.h
+++ b/arch/arm64/include/asm/rqspinlock.h
@@ -3,91 +3,6 @@
#define _ASM_RQSPINLOCK_H
#include <asm/barrier.h>
-
-/*
- * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom
- * version based on [0]. In rqspinlock code, our conditional expression involves
- * checking the value _and_ additionally a timeout. However, on arm64, the
- * WFE-based implementation may never spin again if no stores occur to the
- * locked byte in the lock word. As such, we may be stuck forever if
- * event-stream based unblocking is not available on the platform for WFE spin
- * loops (arch_timer_evtstrm_available).
- *
- * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this
- * copy-paste.
- *
- * While we rely on the implementation to amortize the cost of sampling
- * cond_expr for us, it will not happen when event stream support is
- * unavailable, time_expr check is amortized. This is not the common case, and
- * it would be difficult to fit our logic in the time_expr_ns >= time_limit_ns
- * comparison, hence just let it be. In case of event-stream, the loop is woken
- * up at microsecond granularity.
- *
- * [0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
- */
-
-#ifndef smp_cond_load_acquire_timewait
-
-#define smp_cond_time_check_count 200
-
-#define __smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, \
- time_limit_ns) ({ \
- typeof(ptr) __PTR = (ptr); \
- __unqual_scalar_typeof(*ptr) VAL; \
- unsigned int __count = 0; \
- for (;;) { \
- VAL = READ_ONCE(*__PTR); \
- if (cond_expr) \
- break; \
- cpu_relax(); \
- if (__count++ < smp_cond_time_check_count) \
- continue; \
- if ((time_expr_ns) >= (time_limit_ns)) \
- break; \
- __count = 0; \
- } \
- (typeof(*ptr))VAL; \
-})
-
-#define __smp_cond_load_acquire_timewait(ptr, cond_expr, \
- time_expr_ns, time_limit_ns) \
-({ \
- typeof(ptr) __PTR = (ptr); \
- __unqual_scalar_typeof(*ptr) VAL; \
- for (;;) { \
- VAL = smp_load_acquire(__PTR); \
- if (cond_expr) \
- break; \
- __cmpwait_relaxed(__PTR, VAL); \
- if ((time_expr_ns) >= (time_limit_ns)) \
- break; \
- } \
- (typeof(*ptr))VAL; \
-})
-
-#define smp_cond_load_acquire_timewait(ptr, cond_expr, \
- time_expr_ns, time_limit_ns) \
-({ \
- __unqual_scalar_typeof(*ptr) _val; \
- int __wfe = arch_timer_evtstrm_available(); \
- \
- if (likely(__wfe)) { \
- _val = __smp_cond_load_acquire_timewait(ptr, cond_expr, \
- time_expr_ns, \
- time_limit_ns); \
- } else { \
- _val = __smp_cond_load_relaxed_spinwait(ptr, cond_expr, \
- time_expr_ns, \
- time_limit_ns); \
- smp_acquire__after_ctrl_dep(); \
- } \
- (typeof(*ptr))_val; \
-})
-
-#endif
-
-#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
-
#include <asm-generic/rqspinlock.h>
#endif /* _ASM_RQSPINLOCK_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (4 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 07/12] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
` (5 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
Add the acquire variant of smp_cond_load_relaxed_timeout(). This
reuses the relaxed variant, with additional LOAD->LOAD ordering.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arch@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Haris Okanovic <harisokn@amazon.com>
Tested-by: Haris Okanovic <harisokn@amazon.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
include/asm-generic/barrier.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index e25592f9fcbf..d05d34bece0d 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -331,6 +331,32 @@ __done: \
})
#endif
+/**
+ * smp_cond_load_acquire_timeout() - (Spin) wait for cond with ACQUIRE ordering
+ * until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_expr_ns: monotonic expression that evaluates to time in ns or,
+ * on failure, returns a negative value.
+ * @timeout_ns: timeout value in ns
+ * (Both of the above are assumed to be compatible with s64.)
+ *
+ * Equivalent to using smp_cond_load_acquire() on the condition variable with
+ * a timeout.
+ */
+#ifndef smp_cond_load_acquire_timeout
+#define smp_cond_load_acquire_timeout(ptr, cond_expr, \
+ time_expr_ns, timeout_ns) \
+({ \
+ __unqual_scalar_typeof(*ptr) _val; \
+ _val = smp_cond_load_relaxed_timeout(ptr, cond_expr, \
+ time_expr_ns, \
+ timeout_ns); \
+ smp_acquire__after_ctrl_dep(); \
+ (typeof(*ptr))_val; \
+})
+#endif
+
/*
* pmem_wmb() ensures that all stores for which the modification
* are written to persistent storage by preceding instructions have
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 07/12] atomic: Add atomic_cond_read_*_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (5 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
` (4 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora,
Boqun Feng
Add atomic load wrappers, atomic_cond_read_*_timeout() and
atomic64_cond_read_*_timeout() for the cond-load timeout interfaces.
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
include/linux/atomic.h | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 8dd57c3a99e9..5bcb86e07784 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -31,6 +31,16 @@
#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
#define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
+#define atomic_cond_read_acquire_timeout(v, c, e, t) \
+ smp_cond_load_acquire_timeout(&(v)->counter, (c), (e), (t))
+#define atomic_cond_read_relaxed_timeout(v, c, e, t) \
+ smp_cond_load_relaxed_timeout(&(v)->counter, (c), (e), (t))
+
+#define atomic64_cond_read_acquire_timeout(v, c, e, t) \
+ smp_cond_load_acquire_timeout(&(v)->counter, (c), (e), (t))
+#define atomic64_cond_read_relaxed_timeout(v, c, e, t) \
+ smp_cond_load_relaxed_timeout(&(v)->counter, (c), (e), (t))
+
/*
* The idea here is to build acquire/release variants by adding explicit
* barriers on top of the relaxed variant. In the case where the relaxed
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (6 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 07/12] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
` (3 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora,
Boqun Feng
Add the atomic long wrappers for the cond-load timeout interfaces.
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
include/linux/atomic/atomic-long.h | 18 +++++++++++-------
scripts/atomic/gen-atomic-long.sh | 16 ++++++++++------
2 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/include/linux/atomic/atomic-long.h b/include/linux/atomic/atomic-long.h
index f86b29d90877..e6da0189cbe6 100644
--- a/include/linux/atomic/atomic-long.h
+++ b/include/linux/atomic/atomic-long.h
@@ -11,14 +11,18 @@
#ifdef CONFIG_64BIT
typedef atomic64_t atomic_long_t;
-#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
-#define atomic_long_cond_read_acquire atomic64_cond_read_acquire
-#define atomic_long_cond_read_relaxed atomic64_cond_read_relaxed
+#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
+#define atomic_long_cond_read_acquire atomic64_cond_read_acquire
+#define atomic_long_cond_read_relaxed atomic64_cond_read_relaxed
+#define atomic_long_cond_read_acquire_timeout atomic64_cond_read_acquire_timeout
+#define atomic_long_cond_read_relaxed_timeout atomic64_cond_read_relaxed_timeout
#else
typedef atomic_t atomic_long_t;
-#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
-#define atomic_long_cond_read_acquire atomic_cond_read_acquire
-#define atomic_long_cond_read_relaxed atomic_cond_read_relaxed
+#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
+#define atomic_long_cond_read_acquire atomic_cond_read_acquire
+#define atomic_long_cond_read_relaxed atomic_cond_read_relaxed
+#define atomic_long_cond_read_acquire_timeout atomic_cond_read_acquire_timeout
+#define atomic_long_cond_read_relaxed_timeout atomic_cond_read_relaxed_timeout
#endif
/**
@@ -1809,4 +1813,4 @@ raw_atomic_long_dec_if_positive(atomic_long_t *v)
}
#endif /* _LINUX_ATOMIC_LONG_H */
-// eadf183c3600b8b92b91839dd3be6bcc560c752d
+// 475f45a880d1625faa5116dcfd6e943e4dbe1cd5
diff --git a/scripts/atomic/gen-atomic-long.sh b/scripts/atomic/gen-atomic-long.sh
index 9826be3ba986..874643dc74bd 100755
--- a/scripts/atomic/gen-atomic-long.sh
+++ b/scripts/atomic/gen-atomic-long.sh
@@ -79,14 +79,18 @@ cat << EOF
#ifdef CONFIG_64BIT
typedef atomic64_t atomic_long_t;
-#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
-#define atomic_long_cond_read_acquire atomic64_cond_read_acquire
-#define atomic_long_cond_read_relaxed atomic64_cond_read_relaxed
+#define ATOMIC_LONG_INIT(i) ATOMIC64_INIT(i)
+#define atomic_long_cond_read_acquire atomic64_cond_read_acquire
+#define atomic_long_cond_read_relaxed atomic64_cond_read_relaxed
+#define atomic_long_cond_read_acquire_timeout atomic64_cond_read_acquire_timeout
+#define atomic_long_cond_read_relaxed_timeout atomic64_cond_read_relaxed_timeout
#else
typedef atomic_t atomic_long_t;
-#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
-#define atomic_long_cond_read_acquire atomic_cond_read_acquire
-#define atomic_long_cond_read_relaxed atomic_cond_read_relaxed
+#define ATOMIC_LONG_INIT(i) ATOMIC_INIT(i)
+#define atomic_long_cond_read_acquire atomic_cond_read_acquire
+#define atomic_long_cond_read_relaxed atomic_cond_read_relaxed
+#define atomic_long_cond_read_acquire_timeout atomic_cond_read_acquire_timeout
+#define atomic_long_cond_read_relaxed_timeout atomic_cond_read_relaxed_timeout
#endif
EOF
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (7 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
` (2 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
check_timeout() gets the current time value and depending on how
much time has passed, checks for deadlock or times out, returning 0
or -errno on deadlock or timeout.
Switch this out to a clock style interface, where it functions as a
clock in the "lock-domain", returning the current time until a
deadlock or timeout occurs. Once a deadlock or timeout has occurred,
it stops functioning as a clock and returns error.
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
kernel/bpf/rqspinlock.c | 41 +++++++++++++++++++++++++++--------------
1 file changed, 27 insertions(+), 14 deletions(-)
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index f7d0c8d4644e..ac9b3572e42f 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -196,8 +196,12 @@ static noinline int check_deadlock_ABBA(rqspinlock_t *lock, u32 mask)
return 0;
}
-static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
- struct rqspinlock_timeout *ts)
+/*
+ * Returns current monotonic time in ns on success or, negative errno
+ * value on failure due to timeout expiration or detection of deadlock.
+ */
+static noinline s64 clock_deadlock(rqspinlock_t *lock, u32 mask,
+ struct rqspinlock_timeout *ts)
{
u64 prev = ts->cur;
u64 time;
@@ -207,7 +211,7 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
return -EDEADLK;
ts->cur = ktime_get_mono_fast_ns();
ts->timeout_end = ts->cur + ts->duration;
- return 0;
+ return (s64)ts->cur;
}
time = ktime_get_mono_fast_ns();
@@ -219,11 +223,15 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
* checks.
*/
if (prev + NSEC_PER_MSEC < time) {
+ int ret;
ts->cur = time;
- return check_deadlock_ABBA(lock, mask);
+ ret = check_deadlock_ABBA(lock, mask);
+ if (ret)
+ return ret;
+
}
- return 0;
+ return (s64)time;
}
/*
@@ -234,12 +242,12 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
#define RES_CHECK_TIMEOUT(ts, ret, mask) \
({ \
if (!(ts).spin++) \
- (ret) = check_timeout((lock), (mask), &(ts)); \
+ (ret) = clock_deadlock((lock), (mask), &(ts));\
(ret); \
})
#else
#define RES_CHECK_TIMEOUT(ts, ret, mask) \
- ({ (ret) = check_timeout((lock), (mask), &(ts)); })
+ ({ (ret) = clock_deadlock((lock), (mask), &(ts)); })
#endif
/*
@@ -261,7 +269,8 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask,
int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock)
{
struct rqspinlock_timeout ts;
- int val, ret = 0;
+ s64 ret = 0;
+ int val;
RES_INIT_TIMEOUT(ts);
/*
@@ -280,7 +289,7 @@ int __lockfunc resilient_tas_spin_lock(rqspinlock_t *lock)
val = atomic_read(&lock->val);
if (val || !atomic_try_cmpxchg(&lock->val, &val, 1)) {
- if (RES_CHECK_TIMEOUT(ts, ret, ~0u))
+ if (RES_CHECK_TIMEOUT(ts, ret, ~0u) < 0)
goto out;
cpu_relax();
goto retry;
@@ -339,6 +348,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
{
struct mcs_spinlock *prev, *next, *node;
struct rqspinlock_timeout ts;
+ s64 timeout_err = 0;
int idx, ret = 0;
u32 old, tail;
@@ -405,10 +415,10 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
*/
if (val & _Q_LOCKED_MASK) {
RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
- res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
+ res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_MASK) < 0);
}
- if (ret) {
+ if (timeout_err < 0) {
/*
* We waited for the locked bit to go back to 0, as the pending
* waiter, but timed out. We need to clear the pending bit since
@@ -420,6 +430,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
*/
clear_pending(lock);
lockevent_inc(rqspinlock_lock_timeout);
+ ret = timeout_err;
goto err_release_entry;
}
@@ -567,18 +578,19 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
*/
RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
- RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
+ RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_PENDING_MASK) < 0);
/* Disable queue destruction when we detect deadlocks. */
- if (ret == -EDEADLK) {
+ if (timeout_err == -EDEADLK) {
if (!next)
next = smp_cond_load_relaxed(&node->next, (VAL));
arch_mcs_spin_unlock_contended(&next->locked);
+ ret = timeout_err;
goto err_release_node;
}
waitq_timeout:
- if (ret) {
+ if (timeout_err < 0) {
/*
* If the tail is still pointing to us, then we are the final waiter,
* and are responsible for resetting the tail back to 0. Otherwise, if
@@ -608,6 +620,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
}
lockevent_inc(rqspinlock_lock_timeout);
+ ret = timeout_err;
goto err_release_node;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (8 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 21:40 ` Alexei Starovoitov
2025-12-15 4:49 ` [PATCH v8 11/12] sched: add need-resched timed wait interface Ankur Arora
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
11 siblings, 1 reply; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
Switch out the conditional load interfaces used by rqspinlock
to smp_cond_read_acquire_timeout() and its wrapper,
atomic_cond_read_acquire_timeout().
Both these handle the timeout and amortize as needed, so use
clock_deadlock() directly instead of going through RES_CHECK_TIMEOUT().
For correctness, however, we need to ensure that the timeout case in
smp_cond_read_acquire_timeout() always agrees with that in
clock_deadlock(), which returns with -ETIMEDOUT.
For the most part, this is fine because smp_cond_load_acquire_timeout()
does not have an independent clock and does not do double reads from
clock_deadlock() which could cause its internal state to go out of
sync from that of clock_deadlock().
There is, however, an edge case where clock_deadlock() checks for:
if (time > ts->timeout_end)
return -ETIMEDOUT;
while smp_cond_load_acquire_timeout() checks for:
__time_now = (time_expr_ns);
if (__time_now <= 0 || __time_now >= __time_end) {
VAL = READ_ONCE(*__PTR);
break;
}
This runs into a problem when (__time_now == __time_end) since
clock_deadlock() does not treat it as a timeout condition but
the second clause in the conditional above does.
So, add an equality check in clock_deadlock().
Finally, redefine SMP_TIMEOUT_POLL_COUNT to be 16k to be similar to the
spin-count used in RES_CHECK_TIMEOUT(). We only do this for non-arm64
as that uses a waiting implementation.
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
Notes:
- change the check in clock_deadlock()
kernel/bpf/rqspinlock.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)
diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index ac9b3572e42f..2a361c4c7393 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -215,7 +215,7 @@ static noinline s64 clock_deadlock(rqspinlock_t *lock, u32 mask,
}
time = ktime_get_mono_fast_ns();
- if (time > ts->timeout_end)
+ if (time >= ts->timeout_end)
return -ETIMEDOUT;
/*
@@ -235,20 +235,14 @@ static noinline s64 clock_deadlock(rqspinlock_t *lock, u32 mask,
}
/*
- * Do not amortize with spins when res_smp_cond_load_acquire is defined,
- * as the macro does internal amortization for us.
+ * Amortize timeout check for busy-wait loops.
*/
-#ifndef res_smp_cond_load_acquire
#define RES_CHECK_TIMEOUT(ts, ret, mask) \
({ \
if (!(ts).spin++) \
(ret) = clock_deadlock((lock), (mask), &(ts));\
(ret); \
})
-#else
-#define RES_CHECK_TIMEOUT(ts, ret, mask) \
- ({ (ret) = clock_deadlock((lock), (mask), &(ts)); })
-#endif
/*
* Initialize the 'spin' member.
@@ -262,6 +256,18 @@ static noinline s64 clock_deadlock(rqspinlock_t *lock, u32 mask,
*/
#define RES_RESET_TIMEOUT(ts, _duration) ({ (ts).timeout_end = 0; (ts).duration = _duration; })
+/*
+ * Limit how often we invoke clock_deadlock() while spin-waiting in
+ * smp_cond_load_acquire_timeout() or atomic_cond_read_acquire_timeout().
+ *
+ * (ARM64 generally uses a waited implementation so we use the default
+ * value there.)
+ */
+#ifndef CONFIG_ARM64
+#undef SMP_TIMEOUT_POLL_COUNT
+#define SMP_TIMEOUT_POLL_COUNT (16*1024)
+#endif
+
/*
* Provide a test-and-set fallback for cases when queued spin lock support is
* absent from the architecture.
@@ -312,12 +318,6 @@ EXPORT_SYMBOL_GPL(resilient_tas_spin_lock);
*/
static DEFINE_PER_CPU_ALIGNED(struct qnode, rqnodes[_Q_MAX_NODES]);
-#ifndef res_smp_cond_load_acquire
-#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire(v, c)
-#endif
-
-#define res_atomic_cond_read_acquire(v, c) res_smp_cond_load_acquire(&(v)->counter, (c))
-
/**
* resilient_queued_spin_lock_slowpath - acquire the queued spinlock
* @lock: Pointer to queued spinlock structure
@@ -415,7 +415,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
*/
if (val & _Q_LOCKED_MASK) {
RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
- res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_MASK) < 0);
+ smp_cond_load_acquire_timeout(&lock->locked, !VAL,
+ (timeout_err = clock_deadlock(lock, _Q_LOCKED_MASK, &ts)),
+ ts.duration);
}
if (timeout_err < 0) {
@@ -577,8 +579,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
* us.
*/
RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT * 2);
- val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
- RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_PENDING_MASK) < 0);
+ val = atomic_cond_read_acquire_timeout(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK),
+ (timeout_err = clock_deadlock(lock, _Q_LOCKED_PENDING_MASK, &ts)),
+ ts.duration);
/* Disable queue destruction when we detect deadlocks. */
if (timeout_err == -EDEADLK) {
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 11/12] sched: add need-resched timed wait interface
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (9 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
11 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora,
Ingo Molnar
Add tif_bitset_relaxed_wait() (and tif_need_resched_relaxed_wait()
which wraps it) which takes the thread_info bit and timeout duration
as parameters and waits until the bit is set or for the expiration
of the timeout.
The wait is implemented via smp_cond_load_relaxed_timeout().
smp_cond_load_acquire_timeout() essentially provides the pattern used
in poll_idle() where we spin in a loop waiting for the flag to change
until a timeout occurs.
tif_need_resched_relaxed_wait() allows us to abstract out the internals
of waiting, scheduler specific details etc.
Placed in linux/sched/idle.h instead of linux/thread_info.h to work
around recursive include hell.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
include/linux/sched/idle.h | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 8465ff1f20d1..6780ad760abb 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -3,6 +3,7 @@
#define _LINUX_SCHED_IDLE_H
#include <linux/sched.h>
+#include <linux/sched/clock.h>
enum cpu_idle_type {
__CPU_NOT_IDLE = 0,
@@ -113,4 +114,32 @@ static __always_inline void current_clr_polling(void)
}
#endif
+/*
+ * Caller needs to make sure that the thread context cannot be preempted
+ * or migrated, so current_thread_info() cannot change from under us.
+ *
+ * This also allows us to safely stay in the local_clock domain.
+ */
+static inline bool tif_bitset_relaxed_wait(int bit, s64 timeout_ns)
+{
+ unsigned int flags;
+
+ flags = smp_cond_load_relaxed_timeout(¤t_thread_info()->flags,
+ (VAL & bit),
+ (s64)local_clock_noinstr(),
+ timeout_ns);
+ return flags & bit;
+}
+
+/**
+ * tif_need_resched_relaxed_wait() - Wait for need-resched being set with
+ * no ordering guarantees until a timeout expires.
+ *
+ * @timeout_ns: timeout value.
+ */
+static inline bool tif_need_resched_relaxed_wait(s64 timeout_ns)
+{
+ return tif_bitset_relaxed_wait(TIF_NEED_RESCHED, timeout_ns);
+}
+
#endif /* _LINUX_SCHED_IDLE_H */
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait()
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
` (10 preceding siblings ...)
2025-12-15 4:49 ` [PATCH v8 11/12] sched: add need-resched timed wait interface Ankur Arora
@ 2025-12-15 4:49 ` Ankur Arora
2025-12-15 11:11 ` Rafael J. Wysocki
2025-12-15 21:23 ` kernel test robot
11 siblings, 2 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-15 4:49 UTC (permalink / raw)
To: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Ankur Arora
The inner loop in poll_idle() polls over the thread_info flags,
waiting to see if the thread has TIF_NEED_RESCHED set. The loop
exits once the condition is met, or if the poll time limit has
been exceeded.
To minimize the number of instructions executed in each iteration,
the time check is rate-limited. In addition, each loop iteration
executes cpu_relax() which on certain platforms provides a hint to
the pipeline that the loop busy-waits, allowing the processor to
reduce power consumption.
Switch over to tif_need_resched_relaxed_wait() instead, since that
provides exactly that.
However, given that when running in idle we want to minimize our power
consumption, continue to depend on CONFIG_ARCH_HAS_CPU_RELAX as that
serves as an indicator that the platform supports an optimized version
of tif_need_resched_relaxed_wait() (via
smp_cond_load_acquire_timeout()).
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
Notes:
- use tif_need_resched_relaxed_wait() instead of
smp_cond_load_relaxed_timeout()
drivers/cpuidle/poll_state.c | 27 +++++----------------------
1 file changed, 5 insertions(+), 22 deletions(-)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index c7524e4c522a..20136b3a08c2 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -6,41 +6,24 @@
#include <linux/cpuidle.h>
#include <linux/export.h>
#include <linux/irqflags.h>
-#include <linux/sched.h>
-#include <linux/sched/clock.h>
#include <linux/sched/idle.h>
#include <linux/sprintf.h>
#include <linux/types.h>
-#define POLL_IDLE_RELAX_COUNT 200
-
static int __cpuidle poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
- u64 time_start;
-
- time_start = local_clock_noinstr();
-
dev->poll_time_limit = false;
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
- unsigned int loop_count = 0;
- u64 limit;
+ s64 limit;
+ bool nr_set;
- limit = cpuidle_poll_time(drv, dev);
+ limit = (s64)cpuidle_poll_time(drv, dev);
- while (!need_resched()) {
- cpu_relax();
- if (loop_count++ < POLL_IDLE_RELAX_COUNT)
- continue;
-
- loop_count = 0;
- if (local_clock_noinstr() - time_start > limit) {
- dev->poll_time_limit = true;
- break;
- }
- }
+ nr_set = tif_need_resched_relaxed_wait(limit);
+ dev->poll_time_limit = !nr_set;
}
raw_local_irq_disable();
--
2.31.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait()
2025-12-15 4:49 ` [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
@ 2025-12-15 5:14 ` bot+bpf-ci
0 siblings, 0 replies; 22+ messages in thread
From: bot+bpf-ci @ 2025-12-15 5:14 UTC (permalink / raw)
To: ankur.a.arora, linux-kernel, linux-arch, linux-arm-kernel,
linux-pm, bpf
Cc: arnd, catalin.marinas, will, peterz, akpm, mark.rutland, harisokn,
cl, ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, ankur.a.arora, ast,
andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2893 bytes --]
> diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h
> index 9ea0a74e5892..a385603436e9 100644
> --- a/arch/arm64/include/asm/rqspinlock.h
> +++ b/arch/arm64/include/asm/rqspinlock.h
> @@ -3,91 +3,6 @@
> #define _ASM_RQSPINLOCK_H
>
> #include <asm/barrier.h>
> -
> -/*
> - * Hardcode res_smp_cond_load_acquire implementations for arm64 to a custom
> - * version based on [0]. In rqspinlock code, our conditional expression involves
> - * checking the value _and_ additionally a timeout. However, on arm64, the
> - * WFE-based implementation may never spin again if no stores occur to the
> - * locked byte in the lock word. As such, we may be stuck forever if
> - * event-stream based unblocking is not available on the platform for WFE spin
> - * loops (arch_timer_evtstrm_available).
^^^^
Can lock acquisition hang indefinitely on platforms without event-stream
support? The deleted implementation provided special handling for the
arch_timer_evtstrm_available() == false case using
__smp_cond_load_relaxed_spinwait(), which checked the timeout condition
every 200 iterations via cpu_relax().
After this removal, resilient_queued_spin_lock_slowpath() at line 408
calls:
res_smp_cond_load_acquire(&lock->locked,
!VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
This now falls back to the generic smp_cond_load_acquire() (defined in
kernel/bpf/rqspinlock.c:307), which uses __cmpwait_relaxed(). On
platforms without event-stream, __cmpwait_relaxed() becomes just
cpu_relax() (a yield instruction), and the WFE instruction may never wake
up if no stores occur to the lock word.
Even if WFE does wake up periodically, the RES_CHECK_TIMEOUT macro is
defined with amortization (checking timeout every ~256 iterations when
!(ts.spin++) wraps), which may be too sparse compared to the deleted
implementation's explicit check every 200 iterations.
This means the timeout and deadlock detection logic may not function
correctly, potentially causing lock acquisition to hang instead of
returning -ETIMEDOUT or -EDEADLK after the intended 250ms timeout
(RES_DEF_TIMEOUT).
Note: The next commit in the series (f150b57ec7c8 "bpf/rqspinlock: Use
smp_cond_load_acquire_timeout()") switches to proper timeout-aware
primitives, which resolves this issue.
> - *
> - * Once support for smp_cond_load_acquire_timewait [0] lands, we can drop this
> - * copy-paste.
[ ... ]
> -#define res_smp_cond_load_acquire(v, c) smp_cond_load_acquire_timewait(v, c, 0, 1)
> -
> #include <asm-generic/rqspinlock.h>
>
> #endif /* _ASM_RQSPINLOCK_H */
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20221060323
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait()
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
@ 2025-12-15 11:11 ` Rafael J. Wysocki
2025-12-16 6:55 ` Ankur Arora
2025-12-15 21:23 ` kernel test robot
1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2025-12-15 11:11 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf, arnd,
catalin.marinas, will, peterz, akpm, mark.rutland, harisokn, cl,
ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk
On Mon, Dec 15, 2025 at 5:55 AM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
> The inner loop in poll_idle() polls over the thread_info flags,
> waiting to see if the thread has TIF_NEED_RESCHED set. The loop
> exits once the condition is met, or if the poll time limit has
> been exceeded.
>
> To minimize the number of instructions executed in each iteration,
> the time check is rate-limited. In addition, each loop iteration
> executes cpu_relax() which on certain platforms provides a hint to
> the pipeline that the loop busy-waits, allowing the processor to
> reduce power consumption.
>
> Switch over to tif_need_resched_relaxed_wait() instead, since that
> provides exactly that.
>
> However, given that when running in idle we want to minimize our power
> consumption, continue to depend on CONFIG_ARCH_HAS_CPU_RELAX as that
> serves as an indicator that the platform supports an optimized version
> of tif_need_resched_relaxed_wait() (via
> smp_cond_load_acquire_timeout()).
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: linux-pm@vger.kernel.org
> Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>
> Notes:
> - use tif_need_resched_relaxed_wait() instead of
> smp_cond_load_relaxed_timeout()
>
> drivers/cpuidle/poll_state.c | 27 +++++----------------------
> 1 file changed, 5 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index c7524e4c522a..20136b3a08c2 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -6,41 +6,24 @@
> #include <linux/cpuidle.h>
> #include <linux/export.h>
> #include <linux/irqflags.h>
> -#include <linux/sched.h>
> -#include <linux/sched/clock.h>
> #include <linux/sched/idle.h>
> #include <linux/sprintf.h>
> #include <linux/types.h>
>
> -#define POLL_IDLE_RELAX_COUNT 200
> -
> static int __cpuidle poll_idle(struct cpuidle_device *dev,
> struct cpuidle_driver *drv, int index)
> {
> - u64 time_start;
> -
> - time_start = local_clock_noinstr();
> -
> dev->poll_time_limit = false;
>
> raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> - unsigned int loop_count = 0;
> - u64 limit;
> + s64 limit;
> + bool nr_set;
It doesn't look like the nr_set variable is really needed.
>
> - limit = cpuidle_poll_time(drv, dev);
> + limit = (s64)cpuidle_poll_time(drv, dev);
Is the explicit cast needed to suppress a warning? If not, I'd drop it.
>
> - while (!need_resched()) {
> - cpu_relax();
> - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> - continue;
> -
> - loop_count = 0;
> - if (local_clock_noinstr() - time_start > limit) {
> - dev->poll_time_limit = true;
> - break;
> - }
> - }
> + nr_set = tif_need_resched_relaxed_wait(limit);
> + dev->poll_time_limit = !nr_set;
This can be
dev->poll_time_limit = !tif_need_resched_relaxed_wait(limit);
> }
> raw_local_irq_disable();
>
> --
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 03/12] arm64/delay: move some constants out to a separate header
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
@ 2025-12-15 17:59 ` kernel test robot
2025-12-19 4:52 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-15 17:59 UTC (permalink / raw)
To: Ankur Arora, linux-kernel, linux-arch, linux-arm-kernel, linux-pm,
bpf
Cc: oe-kbuild-all, arnd, catalin.marinas, will, peterz, akpm,
mark.rutland, harisokn, cl, ast, rafael, daniel.lezcano, memxor,
zhenglifeng1, xueshuai, joao.m.martins, boris.ostrovsky,
konrad.wilk, Ankur Arora, Bjorn Andersson, Konrad Dybcio,
Christoph Lameter
Hi Ankur,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master linus/master arnd-asm-generic/master v6.19-rc1 next-20251215]
[cannot apply to arm64/for-next/core bpf-next/net]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Ankur-Arora/asm-generic-barrier-Add-smp_cond_load_relaxed_timeout/20251215-130116
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20251215044919.460086-4-ankur.a.arora%40oracle.com
patch subject: [PATCH v8 03/12] arm64/delay: move some constants out to a separate header
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20251216/202512160125.4rEePFm5-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251216/202512160125.4rEePFm5-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512160125.4rEePFm5-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/soc/qcom/rpmh-rsc.c:29:10: fatal error: asm/delay-const.h: No such file or directory
29 | #include <asm/delay-const.h>
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
vim +29 drivers/soc/qcom/rpmh-rsc.c
8
9 #include <linux/atomic.h>
10 #include <linux/cpu_pm.h>
11 #include <linux/delay.h>
12 #include <linux/interrupt.h>
13 #include <linux/io.h>
14 #include <linux/iopoll.h>
15 #include <linux/kernel.h>
16 #include <linux/ktime.h>
17 #include <linux/list.h>
18 #include <linux/module.h>
19 #include <linux/notifier.h>
20 #include <linux/of.h>
21 #include <linux/of_irq.h>
22 #include <linux/of_platform.h>
23 #include <linux/platform_device.h>
24 #include <linux/pm_domain.h>
25 #include <linux/pm_runtime.h>
26 #include <linux/slab.h>
27 #include <linux/spinlock.h>
28 #include <linux/wait.h>
> 29 #include <asm/delay-const.h>
30
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait()
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2025-12-15 11:11 ` Rafael J. Wysocki
@ 2025-12-15 21:23 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-15 21:23 UTC (permalink / raw)
To: Ankur Arora, linux-kernel, linux-arch, linux-arm-kernel, linux-pm,
bpf
Cc: oe-kbuild-all, arnd, catalin.marinas, will, peterz, akpm,
mark.rutland, harisokn, cl, ast, rafael, daniel.lezcano, memxor,
zhenglifeng1, xueshuai, joao.m.martins, boris.ostrovsky,
konrad.wilk, Ankur Arora
Hi Ankur,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bpf-next/master]
[also build test WARNING on bpf/master linus/master v6.19-rc1 next-20251215]
[cannot apply to arm64/for-next/core bpf-next/net arnd-asm-generic/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Ankur-Arora/asm-generic-barrier-Add-smp_cond_load_relaxed_timeout/20251215-130116
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20251215044919.460086-13-ankur.a.arora%40oracle.com
patch subject: [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait()
config: x86_64-randconfig-005-20251215 (https://download.01.org/0day-ci/archive/20251216/202512160519.pKMkOXsP-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251216/202512160519.pKMkOXsP-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512160519.pKMkOXsP-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> vmlinux.o: warning: objtool: poll_idle+0x3b: call to tif_need_resched_relaxed_wait() leaves .noinstr.text section
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
2025-12-15 4:49 ` [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
@ 2025-12-15 21:40 ` Alexei Starovoitov
2025-12-16 7:34 ` Ankur Arora
0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-12-15 21:40 UTC (permalink / raw)
To: Ankur Arora
Cc: LKML, linux-arch, linux-arm-kernel, Linux Power Management, bpf,
Arnd Bergmann, Catalin Marinas, Will Deacon, Peter Zijlstra,
Andrew Morton, Mark Rutland, harisokn, Christoph Lameter,
Alexei Starovoitov, Rafael J. Wysocki, Daniel Lezcano,
Kumar Kartikeya Dwivedi, zhenglifeng1, xueshuai, joao.m.martins,
Boris Ostrovsky, konrad.wilk
On Sun, Dec 14, 2025 at 8:51 PM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
> /**
> * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
> * @lock: Pointer to queued spinlock structure
> @@ -415,7 +415,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
> */
> if (val & _Q_LOCKED_MASK) {
> RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
> - res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_MASK) < 0);
> + smp_cond_load_acquire_timeout(&lock->locked, !VAL,
> + (timeout_err = clock_deadlock(lock, _Q_LOCKED_MASK, &ts)),
> + ts.duration);
I'm pretty sure we already discussed this and pointed out that
this approach is not acceptable.
We cannot call ktime_get_mono_fast_ns() first.
That's why RES_CHECK_TIMEOUT() exists and it does
if (!(ts).spin++)
before doing the first check_timeout() that will do ktime_get_mono_fast_ns().
Above is a performance critical lock acquisition path where
pending is spinning on the lock word waiting for the owner to
release the lock.
Adding unconditional ktime_get_mono_fast_ns() will destroy
performance for quick critical section.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait()
2025-12-15 11:11 ` Rafael J. Wysocki
@ 2025-12-16 6:55 ` Ankur Arora
0 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-16 6:55 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Ankur Arora, linux-kernel, linux-arch, linux-arm-kernel, linux-pm,
bpf, arnd, catalin.marinas, will, peterz, akpm, mark.rutland,
harisokn, cl, ast, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk
Rafael J. Wysocki <rafael@kernel.org> writes:
> On Mon, Dec 15, 2025 at 5:55 AM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>> The inner loop in poll_idle() polls over the thread_info flags,
>> waiting to see if the thread has TIF_NEED_RESCHED set. The loop
>> exits once the condition is met, or if the poll time limit has
>> been exceeded.
>>
>> To minimize the number of instructions executed in each iteration,
>> the time check is rate-limited. In addition, each loop iteration
>> executes cpu_relax() which on certain platforms provides a hint to
>> the pipeline that the loop busy-waits, allowing the processor to
>> reduce power consumption.
>>
>> Switch over to tif_need_resched_relaxed_wait() instead, since that
>> provides exactly that.
>>
>> However, given that when running in idle we want to minimize our power
>> consumption, continue to depend on CONFIG_ARCH_HAS_CPU_RELAX as that
>> serves as an indicator that the platform supports an optimized version
>> of tif_need_resched_relaxed_wait() (via
>> smp_cond_load_acquire_timeout()).
>>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Cc: linux-pm@vger.kernel.org
>> Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>
>> Notes:
>> - use tif_need_resched_relaxed_wait() instead of
>> smp_cond_load_relaxed_timeout()
>>
>> drivers/cpuidle/poll_state.c | 27 +++++----------------------
>> 1 file changed, 5 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
>> index c7524e4c522a..20136b3a08c2 100644
>> --- a/drivers/cpuidle/poll_state.c
>> +++ b/drivers/cpuidle/poll_state.c
>> @@ -6,41 +6,24 @@
>> #include <linux/cpuidle.h>
>> #include <linux/export.h>
>> #include <linux/irqflags.h>
>> -#include <linux/sched.h>
>> -#include <linux/sched/clock.h>
>> #include <linux/sched/idle.h>
>> #include <linux/sprintf.h>
>> #include <linux/types.h>
>>
>> -#define POLL_IDLE_RELAX_COUNT 200
>> -
>> static int __cpuidle poll_idle(struct cpuidle_device *dev,
>> struct cpuidle_driver *drv, int index)
>> {
>> - u64 time_start;
>> -
>> - time_start = local_clock_noinstr();
>> -
>> dev->poll_time_limit = false;
>>
>> raw_local_irq_enable();
>> if (!current_set_polling_and_test()) {
>> - unsigned int loop_count = 0;
>> - u64 limit;
>> + s64 limit;
>> + bool nr_set;
>
> It doesn't look like the nr_set variable is really needed.
>
>>
>> - limit = cpuidle_poll_time(drv, dev);
>> + limit = (s64)cpuidle_poll_time(drv, dev);
>
> Is the explicit cast needed to suppress a warning? If not, I'd drop it.
Ack.
>>
>> - while (!need_resched()) {
>> - cpu_relax();
>> - if (loop_count++ < POLL_IDLE_RELAX_COUNT)
>> - continue;
>> -
>> - loop_count = 0;
>> - if (local_clock_noinstr() - time_start > limit) {
>> - dev->poll_time_limit = true;
>> - break;
>> - }
>> - }
>> + nr_set = tif_need_resched_relaxed_wait(limit);
>> + dev->poll_time_limit = !nr_set;
>
> This can be
>
> dev->poll_time_limit = !tif_need_resched_relaxed_wait(limit);
Yeah that looks pretty clear.
Thanks!
--
ankur
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
2025-12-15 21:40 ` Alexei Starovoitov
@ 2025-12-16 7:34 ` Ankur Arora
0 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-16 7:34 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Ankur Arora, LKML, linux-arch, linux-arm-kernel,
Linux Power Management, bpf, Arnd Bergmann, Catalin Marinas,
Will Deacon, Peter Zijlstra, Andrew Morton, Mark Rutland,
harisokn, Christoph Lameter, Alexei Starovoitov,
Rafael J. Wysocki, Daniel Lezcano, Kumar Kartikeya Dwivedi,
zhenglifeng1, xueshuai, joao.m.martins, Boris Ostrovsky,
konrad.wilk
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Sun, Dec 14, 2025 at 8:51 PM Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>> /**
>> * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
>> * @lock: Pointer to queued spinlock structure
>> @@ -415,7 +415,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
>> */
>> if (val & _Q_LOCKED_MASK) {
>> RES_RESET_TIMEOUT(ts, RES_DEF_TIMEOUT);
>> - res_smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, timeout_err, _Q_LOCKED_MASK) < 0);
>> + smp_cond_load_acquire_timeout(&lock->locked, !VAL,
>> + (timeout_err = clock_deadlock(lock, _Q_LOCKED_MASK, &ts)),
>> + ts.duration);
>
> I'm pretty sure we already discussed this and pointed out that
> this approach is not acceptable.
Where? I don't see any mail on this at all.
In any case your technical point below is quite reasonable. It's better
to lead with that instead of peremptorily declaring what you find
acceptable and what not.
> We cannot call ktime_get_mono_fast_ns() first.
> That's why RES_CHECK_TIMEOUT() exists and it does
> if (!(ts).spin++)
> before doing the first check_timeout() that will do ktime_get_mono_fast_ns().
> Above is a performance critical lock acquisition path where
> pending is spinning on the lock word waiting for the owner to
> release the lock.
> Adding unconditional ktime_get_mono_fast_ns() will destroy
> performance for quick critical section.
Yes that makes sense.
This is not ideal, but how about something like this:
#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
time_expr_ns, timeout_ns) \
({ \
typeof(ptr) __PTR = (ptr); \
__unqual_scalar_typeof(*ptr) VAL; \
u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
s64 __timeout = (s64)timeout_ns; \
s64 __time_now, __time_end = 0; \
\
for (;;) { \
VAL = READ_ONCE(*__PTR); \
if (cond_expr) \
break; \
cpu_poll_relax(__PTR, VAL, __timeout); \
if (++__n < __spin) \
continue; \
__time_now = (s64)(time_expr_ns); \
if (unlikely(__time_end == 0)) \
__time_end = __time_now + __timeout; \
__timeout = __time_end - __time_now; \
if (__time_now <= 0 || __timeout <= 0) { \
VAL = READ_ONCE(*__PTR); \
break; \
} \
__n = 0; \
} \
\
(typeof(*ptr))VAL; \
})
The problem with this version is that there's no way to know how much
time has passed in the first cpu_poll_relax()). So for the waiting case
this has a builtin overshoot of up to __timeout for the WFET case.
And ~100us for WFE and ~2us for x86 cpu_relax.
Of course we could specify a small __timeout value for the first iteration
which would help WFET.
Anyway let me take another look at this tomorrow.
But, that is more complexity.
Opinions?
--
ankur
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 03/12] arm64/delay: move some constants out to a separate header
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
2025-12-15 17:59 ` kernel test robot
@ 2025-12-19 4:52 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-19 4:52 UTC (permalink / raw)
To: Ankur Arora, linux-kernel, linux-arch, linux-arm-kernel, linux-pm,
bpf
Cc: llvm, oe-kbuild-all, arnd, catalin.marinas, will, peterz, akpm,
mark.rutland, harisokn, cl, ast, rafael, daniel.lezcano, memxor,
zhenglifeng1, xueshuai, joao.m.martins, boris.ostrovsky,
konrad.wilk, Ankur Arora, Bjorn Andersson, Konrad Dybcio,
Christoph Lameter
Hi Ankur,
kernel test robot noticed the following build errors:
[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master linus/master arnd-asm-generic/master v6.19-rc1 next-20251218]
[cannot apply to arm64/for-next/core bpf-next/net]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Ankur-Arora/asm-generic-barrier-Add-smp_cond_load_relaxed_timeout/20251215-130116
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20251215044919.460086-4-ankur.a.arora%40oracle.com
patch subject: [PATCH v8 03/12] arm64/delay: move some constants out to a separate header
config: x86_64-buildonly-randconfig-004-20251215 (https://download.01.org/0day-ci/archive/20251219/202512191227.TXFkHrQf-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251219/202512191227.TXFkHrQf-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512191227.TXFkHrQf-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/soc/qcom/rpmh-rsc.c:29:10: fatal error: 'asm/delay-const.h' file not found
29 | #include <asm/delay-const.h>
| ^~~~~~~~~~~~~~~~~~~
1 error generated.
vim +29 drivers/soc/qcom/rpmh-rsc.c
8
9 #include <linux/atomic.h>
10 #include <linux/cpu_pm.h>
11 #include <linux/delay.h>
12 #include <linux/interrupt.h>
13 #include <linux/io.h>
14 #include <linux/iopoll.h>
15 #include <linux/kernel.h>
16 #include <linux/ktime.h>
17 #include <linux/list.h>
18 #include <linux/module.h>
19 #include <linux/notifier.h>
20 #include <linux/of.h>
21 #include <linux/of_irq.h>
22 #include <linux/of_platform.h>
23 #include <linux/platform_device.h>
24 #include <linux/pm_domain.h>
25 #include <linux/pm_runtime.h>
26 #include <linux/slab.h>
27 #include <linux/spinlock.h>
28 #include <linux/wait.h>
> 29 #include <asm/delay-const.h>
30
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
@ 2025-12-22 7:47 ` Ankur Arora
0 siblings, 0 replies; 22+ messages in thread
From: Ankur Arora @ 2025-12-22 7:47 UTC (permalink / raw)
To: Ankur Arora
Cc: linux-kernel, linux-arch, linux-arm-kernel, linux-pm, bpf, arnd,
catalin.marinas, will, peterz, akpm, mark.rutland, harisokn, cl,
ast, rafael, daniel.lezcano, memxor, zhenglifeng1, xueshuai,
joao.m.martins, boris.ostrovsky, konrad.wilk, Alexei Starovoitov
Ankur Arora <ankur.a.arora@oracle.com> writes:
> Add smp_cond_load_relaxed_timeout(), which extends
> smp_cond_load_relaxed() to allow waiting for a duration.
>
> We loop around waiting for the condition variable to change while
> peridically doing a time-check. The loop uses cpu_poll_relax()
> to slow down the busy-waiting, which, unless overridden by the
> architecture code, amounts to a cpu_relax().
>
> Note that there are two ways for the time-check to fail: once we have
> timed out or, if the time_expr_ns returns an invalid value (negative
> or zero).
>
> The number of times we spin before doing the time-check is specified
> by SMP_TIMEOUT_POLL_COUNT (chosen to be 200 by default) which,
> assuming each cpu_poll_relax() iteration takes ~20-30 cycles (measured
> on a variety of x86 platforms), for a total of ~4000-6000 cycles.
>
> That is also the outer limit of the overshoot when working with the
> parameters above. This might be higher or lesser depending on the
> implementation of cpu_poll_relax() across architectures.
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-arch@vger.kernel.org
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>
> Notes:
>
> - the interface now breaks the time_check_expr into two parts:
> time_expr_ns (evaluates to current time) and remaining_ns. The main
> reason for doing this was to support WFET and similar primitives
> which can do timed waiting.
>
> - cpu_poll_relax() now takes an additional paramater to handle that.
>
> - time_expr_ns can now return failure which needs a little more change
> in organization. This was needed because rqspinlock check_timeout()
> logic mapped naturally to the unified check in time_check_expr.
> Breaking up the time_check_expr, however needed check_timeout() to
> separate a clock interface (which could fail on deadlock or its
> internal timeout check) and the timeout duration.
>
> - given the changes in logic I've removed Catalin and Haris' R-by and
> Tested-by.
>
> include/asm-generic/barrier.h | 58 +++++++++++++++++++++++++++++++++++
> 1 file changed, 58 insertions(+)
>
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index d4f581c1e21d..e25592f9fcbf 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -273,6 +273,64 @@ do { \
> })
> #endif
>
> +/*
> + * Number of times we iterate in the loop before doing the time check.
> + */
> +#ifndef SMP_TIMEOUT_POLL_COUNT
> +#define SMP_TIMEOUT_POLL_COUNT 200
> +#endif
> +
> +#ifndef cpu_poll_relax
> +#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
> +#endif
> +
> +/**
> + * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
> + * guarantees until a timeout expires.
> + * @ptr: pointer to the variable to wait on
> + * @cond: boolean expression to wait for
> + * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
> + * on failure, returns a negative value.
> + * @timeout_ns: timeout value in ns
> + * (Both of the above are assumed to be compatible with s64.)
> + *
> + * Equivalent to using READ_ONCE() on the condition variable.
> + */
> +#ifndef smp_cond_load_relaxed_timeout
> +#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
> + time_expr_ns, timeout_ns) \
> +({ \
> + __label__ __out, __done; \
> + typeof(ptr) __PTR = (ptr); \
> + __unqual_scalar_typeof(*ptr) VAL; \
> + u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
> + s64 __time_now = (s64)(time_expr_ns); \
> + s64 __timeout = (s64)timeout_ns; \
> + s64 __time_end = __time_now + __timeout; \
> + \
> + if (__time_now <= 0) \
> + goto __out; \
> + \
> + for (;;) { \
> + VAL = READ_ONCE(*__PTR); \
> + if (cond_expr) \
> + goto __done; \
> + cpu_poll_relax(__PTR, VAL, __timeout); \
> + if (++__n < __spin) \
> + continue; \
> + __time_now = (s64)(time_expr_ns); \
> + __timeout = __time_end - __time_now; \
> + if (__time_now <= 0 || __timeout <= 0) \
> + goto __out; \
> + __n = 0; \
> + } \
> +__out: \
> + VAL = READ_ONCE(*__PTR); \
> +__done: \
> + (typeof(*ptr))VAL; \
> +})
> +#endif
> +
> /*
> * pmem_wmb() ensures that all stores for which the modification
> * are written to persistent storage by preceding instructions have
Alexei Starovoitov pointed out in his review comment on patch-10 that
evaluating time_expr_ns on entry means that this is unusable in the
fastpath of something like lock acquisition (as rqspinlock does).
Something like the following should work better (it also gets rid of the
gotos.) The only negative is that in the slowpath this would have a
larger overshoot but I think that's a reasonable tradeoff.
I'll be sending out the next version with this changed but just wanted to
gauge any preliminary opinions on this.
Thanks
Ankur
---
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index d4f581c1e21d..278eb51cf354 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -273,6 +273,73 @@ do { \
})
#endif
+/*
+ * Number of times we iterate in the loop before doing the time check.
+ */
+#ifndef SMP_TIMEOUT_POLL_COUNT
+#define SMP_TIMEOUT_POLL_COUNT 200
+#endif
+
+#ifndef cpu_poll_relax
+#define cpu_poll_relax(ptr, val, timeout_ns) cpu_relax()
+#endif
+
+/**
+ * smp_cond_load_relaxed_timeout() - (Spin) wait for cond with no ordering
+ * guarantees until a timeout expires.
+ * @ptr: pointer to the variable to wait on
+ * @cond: boolean expression to wait for
+ * @time_expr_ns: expression that evaluates to monotonic time (in ns) or,
+ * on failure, returns a negative value.
+ * @timeout_ns: timeout value in ns
+ * (Both of the above are assumed to be compatible with s64.)
+ *
+ * Equivalent to using READ_ONCE() on the condition variable.
+ *
+ * Note on timeout overshoot: in the fastpath, we want to keep this as close
+ * to performance as smp_cond_load_relaxed(). To do that we want to defer
+ * the evaluation of the potentially expensive @time_expr_ns to the
+ * earliest time that is needed.
+ *
+ * This means that there will always be some hardware dependent duration
+ * that has already passed inside cpu_poll_relax() at the time of first
+ * evaluation. In addition, cpu_poll_relax() is not guaranteed to return
+ * at timeout boundary.
+ *
+ * In sum, expect timeout overshoot when we return due to expiration of
+ * the timeout.
+ */
+#ifndef smp_cond_load_relaxed_timeout
+#define smp_cond_load_relaxed_timeout(ptr, cond_expr, \
+ time_expr_ns, timeout_ns) \
+({ \
+ typeof(ptr) __PTR = (ptr); \
+ __unqual_scalar_typeof(*ptr) VAL; \
+ u32 __n = 0, __spin = SMP_TIMEOUT_POLL_COUNT; \
+ s64 __timeout = (s64)timeout_ns; \
+ s64 __time_now, __time_end = 0; \
+ \
+ for (;;) { \
+ VAL = READ_ONCE(*__PTR); \
+ if (cond_expr) \
+ break; \
+ cpu_poll_relax(__PTR, VAL, __timeout); \
+ if (++__n < __spin) \
+ continue; \
+ __time_now = (s64)(time_expr_ns); \
+ if (unlikely(__time_end == 0)) \
+ __time_end = __time_now + __timeout; \
+ __timeout = __time_end - __time_now; \
+ if (__time_now <= 0 || __timeout <= 0) { \
+ VAL = READ_ONCE(*__PTR); \
+ break; \
+ } \
+ __n = 0; \
+ } \
+ (typeof(*ptr))VAL; \
+})
+#endif
+
/*
* pmem_wmb() ensures that all stores for which the modification
* are written to persistent storage by preceding instructions have
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-12-22 7:47 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-15 4:49 [PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 01/12] asm-generic: barrier: Add smp_cond_load_relaxed_timeout() Ankur Arora
2025-12-22 7:47 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 02/12] arm64: barrier: Support smp_cond_load_relaxed_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 03/12] arm64/delay: move some constants out to a separate header Ankur Arora
2025-12-15 17:59 ` kernel test robot
2025-12-19 4:52 ` kernel test robot
2025-12-15 4:49 ` [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 05/12] arm64: rqspinlock: Remove private copy of smp_cond_load_acquire_timewait() Ankur Arora
2025-12-15 5:14 ` bot+bpf-ci
2025-12-15 4:49 ` [PATCH v8 06/12] asm-generic: barrier: Add smp_cond_load_acquire_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 07/12] atomic: Add atomic_cond_read_*_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 08/12] locking/atomic: scripts: build atomic_long_cond_read_*_timeout() Ankur Arora
2025-12-15 4:49 ` [PATCH v8 09/12] bpf/rqspinlock: switch check_timeout() to a clock interface Ankur Arora
2025-12-15 4:49 ` [PATCH v8 10/12] bpf/rqspinlock: Use smp_cond_load_acquire_timeout() Ankur Arora
2025-12-15 21:40 ` Alexei Starovoitov
2025-12-16 7:34 ` Ankur Arora
2025-12-15 4:49 ` [PATCH v8 11/12] sched: add need-resched timed wait interface Ankur Arora
2025-12-15 4:49 ` [PATCH v8 12/12] cpuidle/poll_state: Wait for need-resched via tif_need_resched_relaxed_wait() Ankur Arora
2025-12-15 11:11 ` Rafael J. Wysocki
2025-12-16 6:55 ` Ankur Arora
2025-12-15 21:23 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).